Deployment shows as "crashed" but actually got restarted / continues running
efstajas
PROOP

2 years ago

Hey there,

a bit confused by this in the Railway dashboard. We got an email alert earlier this morning that our main frontend server deployment had "crashed", with a prompt to restart it. In the railway dashboard, it also says "crashed 3 hours ago", with a "restart" prompt. However, our site was accessible all throughout, and when I view the "crashed" deployment's logs in the dashboard, there are still new ones coming in — ergo it's clearly running. Which is what we would expect, given the deployment is set to "restart always".

Would you mind clarifying:
- If the deployment had already been automatically restarted (as per our restart policy) and is currently running, why are we being prompted to "restart" it?
- Is there a way to see exactly what was logged as the service crashed? Given the deployment produced new logs since the crash, it's unclear to us now what exception caused the crash. We run Sentry to monitor exceptions, but strangely we didn't track any exception that we'd expect to have crashed the server around the time Railway says the deployment crashed.
- Would clicking "restart" on the "crashed but actually still running" deployment now cause (brief) downtime?
- In this "crashed but actually still running" state, would another hard crash result in it being restarted again, or would it stay down?

Thank you,

Jason

Solved

6 Replies

2 years ago

Hey! Did you manually restart the deployment through the dashboard?

We found a recent bug where a manual restart through the dashboard our API would result in the deployment getting marked as "Crashed" when it's actually alive and running.


Status changed to Awaiting User Response Railway over 1 year ago


efstajas
PROOP

2 years ago

Hey! As far as I know, no... We got the alert with the "restart" button this morning and saw the server show as "crashed" in the dashboard, but decided not to click "restart" given the fact that the server had apparently in fact been restarted automatically and everything seemed healthy.

Unless you're asking if we ever manually restarted this particular service. The answer to that is likely yes.


Status changed to Awaiting Railway Response Railway over 1 year ago


efstajas
PROOP

2 years ago

Hey again! it seems like our the deployment just crashed again, and this time we neither received an email alert about it, nor was it auto-restarted. Instead, we had to manually restart it. (Again, tbc: the service is configured to "always restart").

We're obviously working on urgently fixing this exception that crashes it, but it seems like something is definitely wrong with Railway's restart handling.

To recap:

- This morning, server crashes. We receive an email alert from RW, and see that the status in the dashboard is shown as "Crashed". However, the server had actually been automatically restarted and was healthy.
- We didn't do anything and didn't manually restart it. Things continued to be fine.
- The same error re-occurs and crashed the server again. This time, we neither received an alert, nor did Railway attempt to auto-restart it. We manually clicked Restart, and the server is back, and shown as "Active".


2 years ago

Apologies for the late response.

After it has crashed, do you still the deployment as "Active" (green) or "Crashed" (red)? A restart policy should attempt to restart it until it becomes "Active" but if it exhausted your restart policy's max retries, your deployment will become "Crashed".

Are you able to share which service this is happening in, ideally a link to a deployment that crashed? I'm very curious why you're seeing "crashed but actually still running" deployments -- this definitely sounds like something on our end


Status changed to Awaiting User Response Railway over 1 year ago


efstajas
PROOP

2 years ago

After it crashed, it was showing as "Crashed" (red). As far as I'm aware, when setting the Restart Policy to "Always", there's no max retires, right? The max retries setting only appears when set to "On Failure".

Link to the deployment: https://railway.app/project/56cafcfa-394c-46c9-a811-dc3207bad3dc/service/b1609a08-36c8-4392-8627-3eae500e6d8c?id=1b5e18ad-79ea-456c-a74f-64d201af0099

Right now, everything is green and working fine, but this is the deployment that crashed, got restarted, and after showed as "Crashed" despite being up and healthy.


Status changed to Awaiting Railway Response Railway over 1 year ago


2 years ago

As far as I'm aware, when setting the Restart Policy to "Always", there's no max retires, right?

Yes.

I think I understand what's going on. There are two issues here:

  1. Your crashed deploy is still running. This is definitely a bug on our end that we're going to fix.

  2. Whenever your deploy crashes, regardless of whether we manage to bring it back up online, you'll receive the crash notification. This means your deploy could potentially be in a good state -> crash (get sent email) -> we bring it back up (so it shows as active/green).

2) seems like it's working as intended but it can be confusing/spammy, so we're going to look at ways to improve email notifications handling.

Once we fix 1), you should no longer see "Crashed" deploys running anymore.


Status changed to Awaiting User Response Railway over 1 year ago


Railway
BOT

7 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 7 months ago


Loading...