Why did one service come back and one need manual redeploy?
skortchmark9
HOBBYOP

21 days ago

Hi there, I have two different railway projects with next js hosted web servers. After last night’s issues, one recovered automatically, and one needed a manual redeploy. I don’t see any logs or anything to help me understand why this might have been. Any ideas?

Solved

3 Replies

Railway
BOT

21 days ago

Both of your projects were affected by the May 19 GCP service disruption, which was resolved around 07:57 UTC. Looking at your account, the "apt-heat" service in your "cheap_heat" project had no new deployment and recovered on its own, while the "web" service in your "cartograph" project required a manual redeploy at around 12:14 UTC. During this outage, the GCP image registry was impacted along with the compute hosts. Whether a service could auto-recover depended on the state of its container and host after infrastructure was restored - some containers were able to restart in place, while others lost their image on the affected host and needed a fresh build and deploy. This was not something you could have controlled or predicted; it was a consequence of how the disruption affected the specific hosts your services were running on.


Status changed to Awaiting User Response Railway 21 days ago


skortchmark9
HOBBYOP

21 days ago

I should mention that the service which did not come back had been deployed less recently


Status changed to Awaiting Railway Response Railway 21 days ago


Railway
BOT

21 days ago

That observation is consistent with what happened during the May 19 GCP disruption. The image registry was among the affected components, so whether a service could auto-recover depended on the availability of its container image on the underlying host. A less recently deployed service is more likely to have had its image evicted from local host caches, meaning it would need a registry pull to restart, and that pull could not succeed while the registry was impaired. This was not something within your control, just a consequence of how the disruption affected each service's specific host and image state.


Status changed to Awaiting User Response Railway 21 days ago


Status changed to Solved Railway 21 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...