Auto Restart Deployment after Crash

aarongainz
HOBBY

14 days ago

How do we set up Railway to restart the instance if it crashes?

Right now, if it crashes, it just stays offline - I don't see a way to get this to attempt to auto restart.

We have to basically hawk-eye this deployment to watch for crashes and click Restart when it happens.

There has to be a way to do this, I just can't figure it out.

$10 Bounty

15 Replies

Railway
BOT

14 days ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


14 days ago

Hey, can you change the restart policy and number of retries in your service's settings.

Attachments


uxuz

Hey, can you change the restart policy and number of retries in your service's settings.

aarongainz
HOBBY

13 days ago

I did - it's set to Always. But it still doesn't restart on crash.

I wonder if it's a zero exit code. It's crashing for a heap exception (which I think is a non-zero), but I can't think of any other reason it won't auto-reboot.


aarongainz

I did - it's set to Always. But it still doesn't restart on crash.I wonder if it's a zero exit code. It's crashing for a heap exception (which I think is a non-zero), but I can't think of any other reason it won't auto-reboot.

12 days ago

Hey, does your service actually "crash" or is it just in a state where it is technically still running but no longer responsive.


uxuz

Hey, does your service actually "crash" or is it just in a state where it is technically still running but no longer responsive.

aarongainz
HOBBY

12 days ago

It's definitely crashed. We can an email from Railway indicating our service has crashed, and it's red in the dashboard with the "Crashed" badge with a "Restart" button.


aarongainz

It's definitely crashed. We can an email from Railway indicating our service has crashed, and it's red in the dashboard with the "Crashed" badge with a "Restart" button.

12 days ago

Thank you for confirming this. Can this behavior be reproduced consistently? Also, may I ask the amount of retries you specified and the amount of crash emails you get each time this happens?


uxuz

Thank you for confirming this. Can this behavior be reproduced consistently? Also, may I ask the amount of retries you specified and the amount of crash emails you get each time this happens?

aarongainz
HOBBY

12 days ago

We receive one email, and yes it happens consistently - at least a few times a week. If we choke out the RAM limit, we can probably get it to happen a few times a day.

Here's our restart policy. Not sure where to find the amount of retries, but happy to share if you can guide me!

Attachments


aarongainz

We receive one email, and yes it happens consistently - at least a few times a week. If we choke out the RAM limit, we can probably get it to happen a few times a day.Here's our restart policy. Not sure where to find the amount of retries, but happy to share if you can guide me!

12 days ago

Since it is configured to always restart, there isn't a field to specify the amount of retries. Is the crash related to the fact that your service exceeds the max amount of memory it can scale up to (OOM)?


uxuz

Since it is configured to always restart, there isn't a field to specify the amount of retries. Is the crash related to the fact that your service exceeds the max amount of memory it can scale up to (OOM)?

aarongainz
HOBBY

11 days ago

Yes almost definitely. It’s usually a heap exception in NodeJS that kills it.

We clearly have a small memory leak and we need to resolve that.

But in the meantime, I’d have hoped when we do crash, it’d come back to life.


aarongainz

Yes almost definitely. It’s usually a heap exception in NodeJS that kills it.We clearly have a small memory leak and we need to resolve that.But in the meantime, I’d have hoped when we do crash, it’d come back to life.

11 days ago

Hey, I tried to replicate the OOM behavior with a service and run into a crash loop that eventually ended after some time (retry policy was set to Always). I have also talked with the team and the crash loop behavior will be improved with the next runtime. For now or in general, the best solution is preventing your application from crashing indefinitely by fixing the memory leak.


uxuz

Hey, I tried to replicate the OOM behavior with a service and run into a crash loop that eventually ended after some time (retry policy was set to Always). I have also talked with the team and the crash loop behavior will be improved with the next runtime. For now or in general, the best solution is preventing your application from crashing indefinitely by fixing the memory leak.

aarongainz
HOBBY

10 days ago

Hey,

What does that mean? What is the next runtime? Is there a rough ETA on that?

Does that mean in the meantime auto-restarting just doesn't function for OOM deaths?

Is this considered a bug internally or by-design?

Obviously yes, resolving memory leaks is top priority. Working on that for sure.

But as every developer knows, weeding out memory leaks isn't always straightforward.

If a service errors out (rogue code, memory leak, whatever; anything that saps resources dry), we can't rely on the auto-restart?

To be honest, this feels like a larger issue than its being made out to be.

Currently, this leaves someone babysitting the service at all hours of the day.


aarongainz

Hey,What does that mean? What is the next runtime? Is there a rough ETA on that?Does that mean in the meantime auto-restarting just doesn't function for OOM deaths?Is this considered a bug internally or by-design?Obviously yes, resolving memory leaks is top priority. Working on that for sure.But as every developer knows, weeding out memory leaks isn't always straightforward.If a service errors out (rogue code, memory leak, whatever; anything that saps resources dry), we can't rely on the auto-restart?To be honest, this feels like a larger issue than its being made out to be.Currently, this leaves someone babysitting the service at all hours of the day.

10 days ago

Hey, I understand your frustration. I was unable to reproduce the behavior of the service not restarting at all after it crashes due to OOM (reaching the max amount of RAM it can scale up to). The closest thing I managed to get, is an infinite restarting loop that seems to end randomly after a while.

I'll escalate this thread to the team for further assistance to find out why your service isn't restarting.


10 days ago

This thread has been escalated to the Railway team.

Status changed to Awaiting Railway Response uxuz 10 days ago


aarongainz

Hey,What does that mean? What is the next runtime? Is there a rough ETA on that?Does that mean in the meantime auto-restarting just doesn't function for OOM deaths?Is this considered a bug internally or by-design?Obviously yes, resolving memory leaks is top priority. Working on that for sure.But as every developer knows, weeding out memory leaks isn't always straightforward.If a service errors out (rogue code, memory leak, whatever; anything that saps resources dry), we can't rely on the auto-restart?To be honest, this feels like a larger issue than its being made out to be.Currently, this leaves someone babysitting the service at all hours of the day.

10 days ago

Hey, I have been told by the team, that your service is indeed restarting, just that it is like the same exact situation as I have reproduced, which is an infinite restarting loop that got terminated, as it has been restarted a lot of times.


uxuz

Hey, I have been told by the team, that your service is indeed restarting, just that it is like the same exact situation as I have reproduced, which is an infinite restarting loop that got terminated, as it has been restarted a lot of times.

aarongainz
HOBBY

7 days ago

Interesting - but then the "Restart" button on a [Crashed] container works immediately and every time.

Is there a difference between the auto-restart vs the "Restart" button that we should be accounting for maybe?


aarongainz

Interesting - but then the "Restart" button on a [Crashed] container works immediately and every time.Is there a difference between the auto-restart vs the "Restart" button that we should be accounting for maybe?

6 days ago

Hey, there is no difference between a manual and auto restart.


Auto Restart Deployment after Crash - Railway Help Station