2 months ago
Environment: Railway (multi-region services)
Runtime: Bun
Database: MongoDB Atlas (multi-region)
Project ID: fed40ff9-1ec4-4036-a208-6055169b0a8d
Service ID: cf0a99b7-d7d0-4908-8afd-bd0838d79431
• Requests frequently hang on Railway until client timeouts (no quick 5xx response) → we’re losing dozens of sales per outage.
• RAM usage grows steadily over time (appears to worsen the longer the instance runs) only on Railway; we don’t observe this elsewhere.
• Deploys are slow and often stall, and attempting to switch to “Metal” never finishes (deploy never completes).
This is affecting production at scale and is revenue-impacting.
1) Requests hang until timeout (most urgent)
Periodically (several times per day), requests to our Bun service remain “loading” until the client reaches its timeout threshold.
We see this on normal HTTP endpoints and long-lived SSE endpoints.
When this occurs, there is no immediate 5xx from the platform—just a stall/hang.
2) Exaggerated RAM growth over time (Railway only)
Memory consumption monotonically increases the longer the service is up on Railway.
If we redeploy/restart, RAM drops and then begins rising again.
We do not see the same growing pattern running the exact same code outside Railway (local/other infra).
No obvious GC/heap snapshots pointing to a clear leak on our side;
3) Deploys slow and frequently fail to progress
A significant number of deploys take very long to move from build → start → healthy.
We’ve observed deploys that never reach healthy, requiring manual intervention/cancel/retry.
4) Unable to move to “Metal” (deploy never completes)
We attempted to upgrade/migrate this service to Railway Metal, but the deploy never completes (appears to be stuck indefinitely).
We need help to unblock/force this migration or diagnose why it can’t complete.
2 Replies
2 months ago
Hey there! We've found the following might help you get unblocked faster:
🧵 504 Gateway Timeout on Railway frontend — how to increase the limit?
🧵 HTTP 409 Conflicts Preventing Frontend-Backend Communication
🧵 Is there a reason why sometimes trying to access my service publicaly times out or takes too long?
If you find the answer from one of these, please let us know by solving the thread!
2 months ago
Hello,
We are aware of points 3 and 4 and are working to make the build and deploy pipelines more resilient, unfortunately I don't have an exact status to give on that at this time though.
As for points 1 and 2, those would be application level issues that we have no control over, if requests are hanging, your application is failing to accept the request in a timely manner, and memory leaks are not isolated to Railway in anyway.
I'll open this thread up with a bounty so that the community can help you with point 1 & 2.
Best,
Brody
Status changed to Awaiting User Response Railway • about 2 months ago
2 months ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open brody • about 2 months ago