Massive slowdown on heavy load
jclaveau
PROOP

2 months ago

Hello!

I run an app for students for 2 years on Railway and it suddenly had very long response time. This begun the 4th when students went back to work.

Since monday, I tried a lot of optimizations:

  • adding instances (despite my app is autoscaled with PM2)

  • optimizing the number of queries

  • reducing the number of SQL queries by transaction

  • checking the indexes, the cache, the pgbouncer pool

  • enabling a rate limiter by IP

But nothing changes, my app (api response time) is extremely slow once we reach 200+ users for the last 30 minutes on analytics.

Between 9am (Paris) and 11pm we have more than 400 users constantly and every request takes more than 10s. During the night, or on dev env it's blazing fast

The odd point is that the app worked like a charm with the same load before the holidays and worked very well in October with 2x or 3x more users (without the recent optimizations).

I found this issue and wonder if this is also my case https://discord.com/channels/713503345364697088/1458075059519488115

As you can see in the screenshot, there is no memory or disk space shortage.
The pikes of network usage aligned with CPU usage represents the different updates I deployed trying to fix this

Is there something I miss?

project-id: 65aff0db-6586-4be0-8420-b2e67ae4378d

$30 Bounty

25 Replies

2 months ago

What specific service is slowing down?


jclaveau
PROOP

2 months ago

I wonder if it is Api (Directus) 7a56b219-e2b8-4228-8f14-94ecc49861f0 or Postgres ac6a55a4-5b9c-4e6d-865a-1a0d40372074


2 months ago

What you are telling me is Directus, is a Python app?


jclaveau
PROOP

2 months ago

Directus is a Node CMS


2 months ago

I know, but 7a56b219-e2b8-4228-8f14-94ecc49861f0 is a Python app.


jclaveau
PROOP

2 months ago

NOt here :/

1458898300454441000



2 months ago

Ah, it has a Python component that was tripping me up.

But yeah, direct links would be best.


jclaveau
PROOP

2 months ago

Yes gyp requires python if I remember well


2 months ago

Since Node is single-threaded, it seems to be running quite hot. Have you tried to scale further horizontally?


jclaveau
PROOP

2 months ago

I have PM2 with autoscale


jclaveau
PROOP

2 months ago

it often goes to 15 processes


jclaveau
PROOP

2 months ago

and I also added 2 anstances today without success


2 months ago

Can you instead set a fixed amount of PM2 replicas to 15, just for testing?


jclaveau
PROOP

2 months ago

mmm… for the first time for 4 days, it works quite well (I just deployed an optimization: memoizing a query listing unique indexes)


jclaveau
PROOP

2 months ago

It's not as fast as before but almost acceptable


2 months ago

Just trying to help, but I'll let the community continue in my footsteps to help you debug this.


jclaveau
PROOP

2 months ago

So I guess nothing changed during the holidays?


jclaveau
PROOP

2 months ago

on Railway side


2 months ago

Nothing, we weren't even at work; it was winter holidays for us.


jclaveau
PROOP

2 months ago

Do you know if there is a way to sh to a specific instance of a service?


2 months ago

railway ssh --deployment-instance


jclaveau
PROOP

2 months ago

It's very odd because node is absolutelly not under pressure


jclaveau
PROOP

2 months ago

1458901415618674700


jclaveau
PROOP

2 months ago

The best clue I have concerns PG but I can't get what makes it suddenly slow

1458901985699823600


Loading...