Maintanance downtime with no warning
jon-salmon
HOBBYOP

2 months ago

Today I received an email with the following:

Your Railway service will be affected by an infrastructure maintenance event.

  • Resource: ...
  • Duration: 10 minutes from time of this notification
  • Impact: Service will be offline

There is no action required on your part. Your service will automatically resume once maintenance completes. We apologize for the inconvenience.

Was this scheduled maintenance or a Railway incident? This historical ticket indicates it's likely an incident (https://station.railway.com/questions/your-railway-service-will-be-affected-by-8c1be72e) but if that's the case I've seen no follow up or any admission of the issue? If it's scheduled maintenance, then where was the warning? Just take out my service with no notice!

Worse than that, but what ever was done broke my service so it didn't automatically resume. Restarting the service didn't work (I had to fully redeploy to get it working again so my service was down for several hours as I had no warning this could happen - I ended up having to debug this on my phone while traveling).

What is Railway.com's stated policy on warnings for downtime, as without one there is no way you can claim your service is production ready. You have ignored my prior questions about if you have a deprecation policy (so I can only assume you don't) - https://station.railway.com/questions/smtp-connection-failures-e3c635ac.

You guys have got some great features, but are clearly lacking in what really matters, reliability above all else.

Solved

5 Replies

Status changed to Awaiting Railway Response Railway about 2 months ago


2 months ago

The email you received is from a Railway-initiated deployment, which migrates your service to a different host. These are documented at https://docs.railway.com/deployments/reference#railway-initiated-deployments. Some of these migrations are reactive - triggered by a host fault - meaning we don't have advance warning ourselves, so we can't provide it to you either. That's why the notification arrived at the time of migration rather than in advance.

Your service not automatically resuming after the migration is not expected behavior. We don't have logs from the prior deployment that failed to come back, so we can't pinpoint what went wrong there.

For services where uptime is critical, running multiple replicas across hosts mitigates the impact of single-host events, as documented at https://docs.railway.com/reference/scaling#horizontal-scaling-with-replicas.


Status changed to Awaiting User Response Railway about 2 months ago


jon-salmon
HOBBYOP

a month ago

Thanks for clarification, but some the reason for the redeployment should be included in the email that gets sent as it's currently very opaque. Additionally, your pricing page (https://railway.com/pricing) does not contain any mention that there are uptime differences between the plans and instead that is buried deep in your docs - it feels like your trying to hide this rather.


Status changed to Awaiting Railway Response Railway about 1 month ago


a month ago

Your feedback on both points is noted and appreciated. You can submit these as feature requests at https://station.railway.com/roadmap so other users can upvote them and our product team can track demand. Regarding the pricing page, there are no uptime SLA differences between Hobby and Pro.


Status changed to Awaiting User Response Railway about 1 month ago


jon-salmon
HOBBYOP

a month ago

What does this line on your docs mean then?

At your plan tier, such as Trial or Hobby, you may be pre-emptively moved to a different host to help us optimize workload distribution.


Status changed to Awaiting Railway Response Railway about 1 month ago


a month ago

That is a different scenario than the host maintenance you experienced.


Status changed to Awaiting User Response Railway about 1 month ago


Status changed to Solved jon-salmon about 1 month ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...