2 months ago
Hello!
I run an app for students for 2 years on Railway and it suddenly had very long response time. This begun the 4th when students went back to work.
Since monday, I tried a lot of optimizations:
adding instances (despite my app is autoscaled with PM2)
optimizing the number of queries
reducing the number of SQL queries by transaction
checking the indexes, the cache, the pgbouncer pool
enabling a rate limiter by IP
But nothing changes, my app (api response time) is extremely slow once we reach 200+ users for the last 30 minutes on analytics.
Between 9am (Paris) and 11pm we have more than 400 users constantly and every request takes more than 10s. During the night, or on dev env it's blazing fast
The odd point is that the app worked like a charm with the same load before the holidays and worked very well in October with 2x or 3x more users (without the recent optimizations).
I found this issue and wonder if this is also my case https://discord.com/channels/713503345364697088/1458075059519488115
As you can see in the screenshot, there is no memory or disk space shortage.
The pikes of network usage aligned with CPU usage represents the different updates I deployed trying to fix this
Is there something I miss?
project-id: 65aff0db-6586-4be0-8420-b2e67ae4378d
25 Replies
2 months ago
What specific service is slowing down?
I wonder if it is Api (Directus) 7a56b219-e2b8-4228-8f14-94ecc49861f0 or Postgres ac6a55a4-5b9c-4e6d-865a-1a0d40372074
2 months ago
What you are telling me is Directus, is a Python app?
2 months ago
I know, but 7a56b219-e2b8-4228-8f14-94ecc49861f0 is a Python app.
2 months ago
Ah, it has a Python component that was tripping me up.
But yeah, direct links would be best.
2 months ago
Since Node is single-threaded, it seems to be running quite hot. Have you tried to scale further horizontally?
2 months ago
Can you instead set a fixed amount of PM2 replicas to 15, just for testing?
mmm… for the first time for 4 days, it works quite well (I just deployed an optimization: memoizing a query listing unique indexes)
2 months ago
Just trying to help, but I'll let the community continue in my footsteps to help you debug this.
2 months ago
Nothing, we weren't even at work; it was winter holidays for us.
2 months ago
railway ssh --deployment-instance
The best clue I have concerns PG but I can't get what makes it suddenly slow


