14 days ago
How do we set up Railway to restart the instance if it crashes?
Right now, if it crashes, it just stays offline - I don't see a way to get this to attempt to auto restart.
We have to basically hawk-eye this deployment to watch for crashes and click Restart when it happens.
There has to be a way to do this, I just can't figure it out.
15 Replies
14 days ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
14 days ago
Hey, can you change the restart policy and number of retries in your service's settings.
Attachments
uxuz
Hey, can you change the restart policy and number of retries in your service's settings.
13 days ago
I did - it's set to Always. But it still doesn't restart on crash.
I wonder if it's a zero exit code. It's crashing for a heap exception (which I think is a non-zero), but I can't think of any other reason it won't auto-reboot.
aarongainz
I did - it's set to Always. But it still doesn't restart on crash.I wonder if it's a zero exit code. It's crashing for a heap exception (which I think is a non-zero), but I can't think of any other reason it won't auto-reboot.
12 days ago
Hey, does your service actually "crash" or is it just in a state where it is technically still running but no longer responsive.
uxuz
Hey, does your service actually "crash" or is it just in a state where it is technically still running but no longer responsive.
12 days ago
It's definitely crashed. We can an email from Railway indicating our service has crashed, and it's red in the dashboard with the "Crashed" badge with a "Restart" button.
aarongainz
It's definitely crashed. We can an email from Railway indicating our service has crashed, and it's red in the dashboard with the "Crashed" badge with a "Restart" button.
12 days ago
Thank you for confirming this. Can this behavior be reproduced consistently? Also, may I ask the amount of retries you specified and the amount of crash emails you get each time this happens?
uxuz
Thank you for confirming this. Can this behavior be reproduced consistently? Also, may I ask the amount of retries you specified and the amount of crash emails you get each time this happens?
12 days ago
We receive one email, and yes it happens consistently - at least a few times a week. If we choke out the RAM limit, we can probably get it to happen a few times a day.
Here's our restart policy. Not sure where to find the amount of retries, but happy to share if you can guide me!
Attachments
aarongainz
We receive one email, and yes it happens consistently - at least a few times a week. If we choke out the RAM limit, we can probably get it to happen a few times a day.Here's our restart policy. Not sure where to find the amount of retries, but happy to share if you can guide me!
12 days ago
Since it is configured to always restart, there isn't a field to specify the amount of retries. Is the crash related to the fact that your service exceeds the max amount of memory it can scale up to (OOM)?
uxuz
Since it is configured to always restart, there isn't a field to specify the amount of retries. Is the crash related to the fact that your service exceeds the max amount of memory it can scale up to (OOM)?
11 days ago
Yes almost definitely. It’s usually a heap exception in NodeJS that kills it.
We clearly have a small memory leak and we need to resolve that.
But in the meantime, I’d have hoped when we do crash, it’d come back to life.
aarongainz
Yes almost definitely. It’s usually a heap exception in NodeJS that kills it.We clearly have a small memory leak and we need to resolve that.But in the meantime, I’d have hoped when we do crash, it’d come back to life.
11 days ago
Hey, I tried to replicate the OOM behavior with a service and run into a crash loop that eventually ended after some time (retry policy was set to Always
). I have also talked with the team and the crash loop behavior will be improved with the next runtime. For now or in general, the best solution is preventing your application from crashing indefinitely by fixing the memory leak.
uxuz
Hey, I tried to replicate the OOM behavior with a service and run into a crash loop that eventually ended after some time (retry policy was set to Always). I have also talked with the team and the crash loop behavior will be improved with the next runtime. For now or in general, the best solution is preventing your application from crashing indefinitely by fixing the memory leak.
10 days ago
Hey,
What does that mean? What is the next runtime? Is there a rough ETA on that?
Does that mean in the meantime auto-restarting just doesn't function for OOM deaths?
Is this considered a bug internally or by-design?
Obviously yes, resolving memory leaks is top priority. Working on that for sure.
But as every developer knows, weeding out memory leaks isn't always straightforward.
If a service errors out (rogue code, memory leak, whatever; anything that saps resources dry), we can't rely on the auto-restart?
To be honest, this feels like a larger issue than its being made out to be.
Currently, this leaves someone babysitting the service at all hours of the day.
aarongainz
Hey,What does that mean? What is the next runtime? Is there a rough ETA on that?Does that mean in the meantime auto-restarting just doesn't function for OOM deaths?Is this considered a bug internally or by-design?Obviously yes, resolving memory leaks is top priority. Working on that for sure.But as every developer knows, weeding out memory leaks isn't always straightforward.If a service errors out (rogue code, memory leak, whatever; anything that saps resources dry), we can't rely on the auto-restart?To be honest, this feels like a larger issue than its being made out to be.Currently, this leaves someone babysitting the service at all hours of the day.
10 days ago
Hey, I understand your frustration. I was unable to reproduce the behavior of the service not restarting at all after it crashes due to OOM (reaching the max amount of RAM it can scale up to). The closest thing I managed to get, is an infinite restarting loop that seems to end randomly after a while.
I'll escalate this thread to the team for further assistance to find out why your service isn't restarting.
10 days ago
This thread has been escalated to the Railway team.
Status changed to Awaiting Railway Response uxuz • 10 days ago
aarongainz
Hey,What does that mean? What is the next runtime? Is there a rough ETA on that?Does that mean in the meantime auto-restarting just doesn't function for OOM deaths?Is this considered a bug internally or by-design?Obviously yes, resolving memory leaks is top priority. Working on that for sure.But as every developer knows, weeding out memory leaks isn't always straightforward.If a service errors out (rogue code, memory leak, whatever; anything that saps resources dry), we can't rely on the auto-restart?To be honest, this feels like a larger issue than its being made out to be.Currently, this leaves someone babysitting the service at all hours of the day.
10 days ago
Hey, I have been told by the team, that your service is indeed restarting, just that it is like the same exact situation as I have reproduced, which is an infinite restarting loop that got terminated, as it has been restarted a lot of times.
uxuz
Hey, I have been told by the team, that your service is indeed restarting, just that it is like the same exact situation as I have reproduced, which is an infinite restarting loop that got terminated, as it has been restarted a lot of times.
7 days ago
Interesting - but then the "Restart" button on a [Crashed] container works immediately and every time.
Is there a difference between the auto-restart vs the "Restart" button that we should be accounting for maybe?
aarongainz
Interesting - but then the "Restart" button on a [Crashed] container works immediately and every time.Is there a difference between the auto-restart vs the "Restart" button that we should be accounting for maybe?
6 days ago
Hey, there is no difference between a manual and auto restart.