Service Intermittency and Periodic Disconnections on Railway
thefabi8a
HOBBYOP

2 months ago

Dear Railway Team,

I am experiencing a critical issue with my service deployed on your platform. The observed behavior is as follows:

  • The service disconnects approximately every 2 minutes.
  • It then recovers automatically without triggering a new deploy or container restart.
  • This is not caused by a server crash or process termination.
  • During these interruptions, active connections (especially WebSockets) are dropped.

I have verified that:

  • The application code is not throwing fatal errors or exceptions.
  • There are no cron jobs, serverless configurations, or any visible auto-sleep/downtime settings enabled.
  • I configured a /health endpoint to help keep the service alive, but it did not resolve the issue.

Additionally:

  • There is an initial spike in vCPU usage when the service starts.
  • After that, resource usage stabilizes, but the disconnection pattern continues consistently.
  • The issue appears to follow a predictable and recurring interval.

This behavior directly impacts my application, which relies on persistent real-time connections (WebSockets) for:

  • Bot command execution
  • Real-time event processing and storage

For this reason, it is essential that the service remains continuously active without interruptions.

I would appreciate your assistance in clarifying:

  1. Whether there are any internal mechanisms that could be causing these periodic disconnections.
  2. If there are platform-level timeouts, limits, or networking policies affecting persistent connections.
  3. Recommended configurations or best practices to ensure high availability for real-time applications on Railway.

Please let me know if you need any additional information to help diagnose the issue.

Thank you for your time and support.

Sincerely,

TheFabi8A

Solved$10 Bounty

Pinned Solution

domehane
FREE

2 months ago

Hello,

the ~2 min interval and the fact that no restart or crash happens points strongly to a proxy level idle timeout dropping your connections, not your app. railway sits an edge proxy in front of your service and it will cut connections that appear idle even if your app is perfectly healthy. your /health endpoint won't fix this because it only matters at deploy time, not at runtime so the fix is to send websocket ping/pong frames from your server every 10-20 seconds to keep the connection alive and signal activity through the proxy ; that's the first thing worth trying before anything else

Hope this help you :)

4 Replies

thefabi8a
HOBBYOP

2 months ago

screenshot

Attachments


Status changed to Awaiting Railway Response Railway about 2 months ago


Status changed to Open Railway about 2 months ago


domehane
FREE

2 months ago

Hello,

the ~2 min interval and the fact that no restart or crash happens points strongly to a proxy level idle timeout dropping your connections, not your app. railway sits an edge proxy in front of your service and it will cut connections that appear idle even if your app is perfectly healthy. your /health endpoint won't fix this because it only matters at deploy time, not at runtime so the fix is to send websocket ping/pong frames from your server every 10-20 seconds to keep the connection alive and signal activity through the proxy ; that's the first thing worth trying before anything else

Hope this help you :)


andreahlert
PRO

2 months ago

Complementing what domehane said: the ~2 min interval is a classic proxy idle timeout signature.

For the ping/pong implementation, make sure you're sending server-side pings, not just relying on client-side keepalives. Example with ws in Node.js:

const interval = setInterval(() => {
  if (ws.readyState === ws.OPEN) ws.ping();
}, 15000);
ws.on('close', () => clearInterval(interval));

Also worth checking: Railway's proxy enforces a 60-second idle timeout by default on HTTP connections. For WebSockets specifically, the connection upgrade should bypass this, but the ping/pong frames are still necessary to signal the connection is active.

If the disconnections persist after implementing ping/pong, two things to verify:

  1. Your service is deployed with a public networking domain (not a private service), since internal services behave differently
  2. Check Railway's deployment logs for any 502 or 504 entries around the disconnection timestamps, which would confirm the proxy is the culprit and warrant a support ticket at railway.com/help

Status changed to Solved brody about 2 months ago


domehane

Hello, the \~2 min interval and the fact that no restart or crash happens points strongly to a proxy level idle timeout dropping your connections, not your app. railway sits an edge proxy in front of your service and it will cut connections that appear idle even if your app is perfectly healthy. your /health endpoint won't fix this because it only matters at deploy time, not at runtime so the fix is to send websocket ping/pong frames from your server every 10-20 seconds to keep the connection alive and signal activity through the proxy ; that's the first thing worth trying before anything else Hope this help you :)

thefabi8a
HOBBYOP

2 months ago

I spent several weeks trying to fix it and it was just a matter of defining the ping values ​​to the socket.io haha, thank you so much

const io = new Server(httpServer, {

cors: {

origin: '*',

methods: ['GET', 'POST']

},

pingInterval: 20000,

pingTimeout: 10000

})


Status changed to Awaiting Railway Response Railway about 2 months ago


Status changed to Solved thefabi8a about 2 months ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...