2 months ago
Hi, I have a problem: every time I do a new deployment, the previous one crashes...
I think it might be a mount volume issue?
I implemented a health check to verify that the new one is working properly before taking the old one offline, but it clearly isn't working.
Any ideas on how to prevent this?
Pinned Solution
2 months ago
Hello,
this is actually expected railway behavior when you have a volume attached. railway explicitly prevents multiple deployments from mounting the same volume at the same time to avoid data corruption, and this causes a brief downtime even if you have a healthcheck configured , the healthcheck won't fix this, that's just how volumes work on railway
the old deployment gets sent a sigterm and may be force-killed before it can shut down cleanly, which is why it shows as crashed
the fix is to set the RAILWAY_DEPLOYMENT_DRAINING_SECONDS service variable to a value greater than 0, this gives the old deployment time to exit cleanly instead of being killed, which should stop the crashed status. you can also set RAILWAY_DEPLOYMENT_OVERLAP_SECONDS to control how long the old deployment stays active after the new one is live
i think that true zero downtime is not possible with volumes on railway,maybe it's a hard limitation by design
ref: https://docs.railway.com/reference/volumes
i hope this help you :)
2 Replies
2 months ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 2 months ago
2 months ago
Hello,
this is actually expected railway behavior when you have a volume attached. railway explicitly prevents multiple deployments from mounting the same volume at the same time to avoid data corruption, and this causes a brief downtime even if you have a healthcheck configured , the healthcheck won't fix this, that's just how volumes work on railway
the old deployment gets sent a sigterm and may be force-killed before it can shut down cleanly, which is why it shows as crashed
the fix is to set the RAILWAY_DEPLOYMENT_DRAINING_SECONDS service variable to a value greater than 0, this gives the old deployment time to exit cleanly instead of being killed, which should stop the crashed status. you can also set RAILWAY_DEPLOYMENT_OVERLAP_SECONDS to control how long the old deployment stays active after the new one is live
i think that true zero downtime is not possible with volumes on railway,maybe it's a hard limitation by design
ref: https://docs.railway.com/reference/volumes
i hope this help you :)
2 months ago
Hey, thanks a lot for taking the time to explain this, really appreciate the help!
We tried setting RAILWAY_DEPLOYMENT_DRAINING_SECONDS and RAILWAY_DEPLOYMENT_OVERLAP_SECONDS but unfortunately we're still experiencing downtime during deployments.
So our solution is going to be moving to external storage, which for our use case works just fine.
Honestly it's a bit disappointing that this is a hard limitation with volumes on Railway, but it is what it is. Thanks again for pointing us in the right direction!
Status changed to Solved sam-a • 2 months ago