Postgres service stuck in catatonit: failed to exec pid1 loop — likely Image Registry incident aftermath
matveymaslov
HOBBYOP

17 days ago

Project: energetic-fascination / environment production / service Postgres deployment 683e781d)

Postgres has been crash-looping for ~6+ hours. Every restart attempt logs:

Mounting volume on: /var/lib/containers/railwayapp/bind-mounts/38c1d661-82c4-47d3-83e7-84434aa7a57d/vol_dhd63legn5e5nbqv

ERROR (catatonit:2): failed to exec pid1: No such file or directory. The volume mounts successfully. The error is at the catatonit init step, which suggests the Postgres container image is missing or corrupted. This started around the same time as today's "Image Registry (Metal)" incident (May 20, 11:21 UTC). Your status page says a fix was deployed and you're monitoring, but this specific service hasn't recovered.

Before the loop started, the database was running fine (last clean shutdown logged at 22:22:17 UTC May 19, with received fast shutdown request followed by clean checkpoint).

Please:

  1. Re-pull the Postgres container image so the service can start. The volume contents should be intact.
  2. Do NOT delete the volume or the service — my data is on it and there's no recent backup.

Happy to provide more logs if useful.

Solved

1 Replies

Status changed to Awaiting Railway Response Railway 17 days ago


sam-a
EMPLOYEE

17 days ago

Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.

It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.

You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY

If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c

Feel free to respond if your question has not been addressed.


Status changed to Awaiting User Response Railway 17 days ago


Railway
BOT

9 days ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 9 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...