Why did one service come back and one need manual redeploy?

skortchmark9

HOBBYOP

a month ago

Hi there, I have two different railway projects with next js hosted web servers. After last night’s issues, one recovered automatically, and one needed a manual redeploy. I don’t see any logs or anything to help me understand why this might have been. Any ideas?

Solved

3 Replies

Railway

BOT

a month ago

Both of your projects were affected by the May 19 GCP service disruption, which was resolved around 07:57 UTC. Looking at your account, the "apt-heat" service in your "cheap_heat" project had no new deployment and recovered on its own, while the "web" service in your "cartograph" project required a manual redeploy at around 12:14 UTC. During this outage, the GCP image registry was impacted along with the compute hosts. Whether a service could auto-recover depended on the state of its container and host after infrastructure was restored - some containers were able to restart in place, while others lost their image on the affected host and needed a fresh build and deploy. This was not something you could have controlled or predicted; it was a consequence of how the disruption affected the specific hosts your services were running on.

Status changed to Awaiting User Response Railway • about 1 month ago

skortchmark9

HOBBYOP

a month ago

I should mention that the service which did not come back had been deployed less recently

Status changed to Awaiting Railway Response Railway • about 1 month ago

Railway

BOT

a month ago

That observation is consistent with what happened during the May 19 GCP disruption. The image registry was among the affected components, so whether a service could auto-recover depended on the availability of its container image on the underlying host. A less recently deployed service is more likely to have had its image evicted from local host caches, meaning it would need a registry pull to restart, and that pull could not succeed while the registry was impaired. This was not something within your control, just a consequence of how the disruption affected each service's specific host and image state.

Status changed to Awaiting User Response Railway • about 1 month ago

Status changed to Solved Railway • about 1 month ago

Welcome!