a month ago
Today I recognised that my service is down, I did not receive any notifications about that. In the logs I saw this error, but there is no way to check the full log. I was not able to ssh to the service as well.
Is there a way to configure alerts and be notified when something is wrong?
I didn't find such.
What has helped — manually restart the deployment.
But I still don't know what was the reason for this behaviour.
What are your recommendations?
5 Replies
a month ago
Hey there!
For some info on this, we had an incident that caused a percentage of our deploys to go down.
That being said we're also looking into how to better notify users of impact when it happens. I don't have the best solution for you but in the future I'd hope we notify as soon as we detect it happening.
Can subscribe to our status page and get email alerts when we call an incident.
If you're interested the full post mortem on that is here
https://blog.railway.com/p/incident-report-february-11-2026
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
Thanks for the reply. Great to see elaborate postmortem. Any idea why automatic recovery did not work in my case?
Attachments
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
I'm not quite sure entirely why it didnt work in your case, we should've restarted everybody that we got a log of. Theres a chance that a very small few were dropped in that so i apologize.
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
Out of maybe 10 services that were shut down, less than 50% of them were automatically restarted. And I have read in other threads as well that the restarts didn't work. So I am not sure how true that actually is.
Status changed to Awaiting Railway Response Railway • about 1 month ago
hexatare
Out of maybe 10 services that were shut down, less than 50% of them were automatically restarted. And I have read in other threads as well that the restarts didn't work. So I am not sure how true that actually is.
a month ago
Just because we attempt a restart doesn't mean that it will be fullproof. Example, an application with no reconnect logic will not reconnected to a SIGTERMed DB. Hence why you see the whole support team trying to scour for more affected cases.
Status changed to Awaiting User Response Railway • about 1 month ago
23 days ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • 23 days ago