Crash error every new deploy
diegoavidal
PROOP

2 months ago

Hi, I have a problem: every time I do a new deployment, the previous one crashes...

I think it might be a mount volume issue?

I implemented a health check to verify that the new one is working properly before taking the old one offline, but it clearly isn't working.

Any ideas on how to prevent this?

Solved$10 Bounty

Pinned Solution

domehane
FREE

2 months ago

Hello,

this is actually expected railway behavior when you have a volume attached. railway explicitly prevents multiple deployments from mounting the same volume at the same time to avoid data corruption, and this causes a brief downtime even if you have a healthcheck configured , the healthcheck won't fix this, that's just how volumes work on railway

the old deployment gets sent a sigterm and may be force-killed before it can shut down cleanly, which is why it shows as crashed

the fix is to set the RAILWAY_DEPLOYMENT_DRAINING_SECONDS service variable to a value greater than 0, this gives the old deployment time to exit cleanly instead of being killed, which should stop the crashed status. you can also set RAILWAY_DEPLOYMENT_OVERLAP_SECONDS to control how long the old deployment stays active after the new one is live

i think that true zero downtime is not possible with volumes on railway,maybe it's a hard limitation by design

ref: https://docs.railway.com/reference/volumes

i hope this help you :)

2 Replies

Railway
BOT

2 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway 2 months ago


domehane
FREE

2 months ago

Hello,

this is actually expected railway behavior when you have a volume attached. railway explicitly prevents multiple deployments from mounting the same volume at the same time to avoid data corruption, and this causes a brief downtime even if you have a healthcheck configured , the healthcheck won't fix this, that's just how volumes work on railway

the old deployment gets sent a sigterm and may be force-killed before it can shut down cleanly, which is why it shows as crashed

the fix is to set the RAILWAY_DEPLOYMENT_DRAINING_SECONDS service variable to a value greater than 0, this gives the old deployment time to exit cleanly instead of being killed, which should stop the crashed status. you can also set RAILWAY_DEPLOYMENT_OVERLAP_SECONDS to control how long the old deployment stays active after the new one is live

i think that true zero downtime is not possible with volumes on railway,maybe it's a hard limitation by design

ref: https://docs.railway.com/reference/volumes

i hope this help you :)


diegoavidal
PROOP

2 months ago

Hey, thanks a lot for taking the time to explain this, really appreciate the help!

We tried setting RAILWAY_DEPLOYMENT_DRAINING_SECONDS and RAILWAY_DEPLOYMENT_OVERLAP_SECONDS but unfortunately we're still experiencing downtime during deployments.

So our solution is going to be moving to external storage, which for our use case works just fine.

Honestly it's a bit disappointing that this is a hard limitation with volumes on Railway, but it is what it is. Thanks again for pointing us in the right direction!


Status changed to Solved sam-a 2 months ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...