2 months ago
After migrating from Kubernetes to Railway, I've noticed a few outages have caused some of our services to crash, then they are not being properly restarted.
on kubernetes a "crashed" app would retry indefinitely... the cause of this crash was a connection error to rabbitmq which was resolved in 5 minutes, but after 10 app restarts railway stops spinning up the service.... I think this is not good.... a crash caused by an oopsie in your infrastracture lead to services being taken down.
I was shopping the other day and one of our customers were trading OTC and the services weren't working, I luckily managed to fix it by using my phone to restart the service and the trade went through but like..... why aren't services guaranteed to be up after they crash, running this system on kubernetes for years I never had this issue.
Attachments
4 Replies
2 months ago
Hi there,
Railway has a configurable Restart Policy that controls what happens when a deployed service stops or crashes. You can find this in your service's Settings tab under the deployment settings, where you'll be able to adjust the behavior to fit your needs.
For production workloads where you need robust handling of transient failures, pair application-level restarts with Healthchecks in Railway. If a health check fails, Railway will continuously attempt to restart unhealthy services rather than giving up.
If you are still having issues or need further assistance, could you please link the Railway service you are mentioning?
Best, The Railway Team
Status changed to Awaiting User Response Railway • about 2 months ago
Status changed to Awaiting Railway Response Railway • about 2 months ago
2 months ago
Is there a faster way to set this on a bunch of services then going in one by one?
Attachments
2 months ago
I would recommend setting up healthchecks for each service, this would allow them to be restarted infinitely.
Here are some docs on healthchecks: https://docs.railway.com/reference/healthchecks
Best,
The Railway Team
Status changed to Awaiting User Response Railway • about 2 months ago
a month ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • about 1 month ago