4 months ago
Hi!
I deployed a containerized FastAPI service which ran uninterrupted for ~3 weeks, and has been running with minor updates for more than a year now.
Today, the service shut down (gracefully), and sat in "COMPLETED" state. It never came back up, despite having Restart Policy: On Failure.
I can switch to restart policy always, but would that even fix it? Why did my container get a random sigterm/sigint after 3 weeks? Shouldn't the railway orchestration restart it if it needs to evict it from the host for some reason (updates, outage, etc)?
6 Replies
4 months ago
I'd guess that this is related to an incident Railway had earlier today.
Incident Report: February 11, 2026
Did the service come back up after you restarted it?
4 months ago
Yes- it does seem to line up with the outage. Looking at the timeline- it looks like it should have been restarted automatically though?
cranium
Yes- it does seem to line up with the outage. Looking at the timeline- it looks like it should have been restarted automatically though?
4 months ago
Not all of my services came back up after the outage by themselves.
I had to manually restart a few.
4 months ago
@mykal, is your restart policy set to Failure or Always for services that did not start automatically?
mykal
Not all of my services came back up after the outage by themselves. I had to manually restart a few.
4 months ago
I just realized this depends on restart policy.
