Critical: Bun backend on Railway — memory growth, frequent request hangs/timeouts, slow/stuck deploys
travelsmarttravelfastdeveloper
PROOP

2 months ago

Environment: Railway (multi-region services)

Runtime: Bun

Database: MongoDB Atlas (multi-region)

Project ID: fed40ff9-1ec4-4036-a208-6055169b0a8d

Service ID: cf0a99b7-d7d0-4908-8afd-bd0838d79431

• Requests frequently hang on Railway until client timeouts (no quick 5xx response) → we’re losing dozens of sales per outage.
• RAM usage grows steadily over time (appears to worsen the longer the instance runs) only on Railway; we don’t observe this elsewhere.
• Deploys are slow and often stall, and attempting to switch to “Metal” never finishes (deploy never completes).

This is affecting production at scale and is revenue-impacting.

1) Requests hang until timeout (most urgent)

  • Periodically (several times per day), requests to our Bun service remain “loading” until the client reaches its timeout threshold.

  • We see this on normal HTTP endpoints and long-lived SSE endpoints.

  • When this occurs, there is no immediate 5xx from the platform—just a stall/hang.

2) Exaggerated RAM growth over time (Railway only)

  • Memory consumption monotonically increases the longer the service is up on Railway.

  • If we redeploy/restart, RAM drops and then begins rising again.

  • We do not see the same growing pattern running the exact same code outside Railway (local/other infra).

  • No obvious GC/heap snapshots pointing to a clear leak on our side;

3) Deploys slow and frequently fail to progress

  • A significant number of deploys take very long to move from build → start → healthy.

  • We’ve observed deploys that never reach healthy, requiring manual intervention/cancel/retry.

4) Unable to move to “Metal” (deploy never completes)

  • We attempted to upgrade/migrate this service to Railway Metal, but the deploy never completes (appears to be stuck indefinitely).

  • We need help to unblock/force this migration or diagnose why it can’t complete.

$40 Bounty

2 Replies


brody
EMPLOYEE

2 months ago

Hello,

We are aware of points 3 and 4 and are working to make the build and deploy pipelines more resilient, unfortunately I don't have an exact status to give on that at this time though.

As for points 1 and 2, those would be application level issues that we have no control over, if requests are hanging, your application is failing to accept the request in a timely manner, and memory leaks are not isolated to Railway in anyway.

I'll open this thread up with a bounty so that the community can help you with point 1 & 2.

Best,

Brody


Status changed to Awaiting User Response Railway about 2 months ago


brody
EMPLOYEE

2 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody about 2 months ago


Loading...