2 months ago
Dear Railway Team,
I am experiencing a critical issue with my service deployed on your platform. The observed behavior is as follows:
- The service disconnects approximately every 2 minutes.
- It then recovers automatically without triggering a new deploy or container restart.
- This is not caused by a server crash or process termination.
- During these interruptions, active connections (especially WebSockets) are dropped.
I have verified that:
- The application code is not throwing fatal errors or exceptions.
- There are no cron jobs, serverless configurations, or any visible auto-sleep/downtime settings enabled.
- I configured a
/healthendpoint to help keep the service alive, but it did not resolve the issue.
Additionally:
- There is an initial spike in vCPU usage when the service starts.
- After that, resource usage stabilizes, but the disconnection pattern continues consistently.
- The issue appears to follow a predictable and recurring interval.
This behavior directly impacts my application, which relies on persistent real-time connections (WebSockets) for:
- Bot command execution
- Real-time event processing and storage
For this reason, it is essential that the service remains continuously active without interruptions.
I would appreciate your assistance in clarifying:
- Whether there are any internal mechanisms that could be causing these periodic disconnections.
- If there are platform-level timeouts, limits, or networking policies affecting persistent connections.
- Recommended configurations or best practices to ensure high availability for real-time applications on Railway.
Please let me know if you need any additional information to help diagnose the issue.
Thank you for your time and support.
Sincerely,
TheFabi8A
Pinned Solution
2 months ago
Hello,
the ~2 min interval and the fact that no restart or crash happens points strongly to a proxy level idle timeout dropping your connections, not your app. railway sits an edge proxy in front of your service and it will cut connections that appear idle even if your app is perfectly healthy. your /health endpoint won't fix this because it only matters at deploy time, not at runtime so the fix is to send websocket ping/pong frames from your server every 10-20 seconds to keep the connection alive and signal activity through the proxy ; that's the first thing worth trying before anything else
Hope this help you :)
4 Replies
Status changed to Awaiting Railway Response Railway • about 2 months ago
Status changed to Open Railway • about 2 months ago
2 months ago
Hello,
the ~2 min interval and the fact that no restart or crash happens points strongly to a proxy level idle timeout dropping your connections, not your app. railway sits an edge proxy in front of your service and it will cut connections that appear idle even if your app is perfectly healthy. your /health endpoint won't fix this because it only matters at deploy time, not at runtime so the fix is to send websocket ping/pong frames from your server every 10-20 seconds to keep the connection alive and signal activity through the proxy ; that's the first thing worth trying before anything else
Hope this help you :)
2 months ago
Complementing what domehane said: the ~2 min interval is a classic proxy idle timeout signature.
For the ping/pong implementation, make sure you're sending server-side pings, not just relying on client-side keepalives. Example with ws in Node.js:
const interval = setInterval(() => {
if (ws.readyState === ws.OPEN) ws.ping();
}, 15000);
ws.on('close', () => clearInterval(interval));Also worth checking: Railway's proxy enforces a 60-second idle timeout by default on HTTP connections. For WebSockets specifically, the connection upgrade should bypass this, but the ping/pong frames are still necessary to signal the connection is active.
If the disconnections persist after implementing ping/pong, two things to verify:
- Your service is deployed with a public networking domain (not a private service), since internal services behave differently
- Check Railway's deployment logs for any
502or504entries around the disconnection timestamps, which would confirm the proxy is the culprit and warrant a support ticket at railway.com/help
Status changed to Solved brody • about 2 months ago
domehane
Hello, the \~2 min interval and the fact that no restart or crash happens points strongly to a proxy level idle timeout dropping your connections, not your app. railway sits an edge proxy in front of your service and it will cut connections that appear idle even if your app is perfectly healthy. your /health endpoint won't fix this because it only matters at deploy time, not at runtime so the fix is to send websocket ping/pong frames from your server every 10-20 seconds to keep the connection alive and signal activity through the proxy ; that's the first thing worth trying before anything else Hope this help you :)
2 months ago
I spent several weeks trying to fix it and it was just a matter of defining the ping values to the socket.io haha, thank you so much
const io = new Server(httpServer, {
cors: {
origin: '*',
methods: ['GET', 'POST']
},
pingInterval: 20000,
pingTimeout: 10000
})
Status changed to Awaiting Railway Response Railway • about 2 months ago
Status changed to Solved thefabi8a • about 2 months ago

