Pro] Postgres crash loop: catatonit failed to exec pid1
guilhermepresas
PROOP

22 days ago

Hi Railway Team,

I just upgraded to Pro plan today.

My Postgres deployment on production environment (project: jubilant-commitment, service: Postgres, deployment ab34327d) is stuck in a crash loop with this error repeating every ~15 seconds:

ERROR (catatonit:2): failed to exec pid1: No such file or directory

The volume mounts successfully but the postgres-ssl:18 image fails to find the entrypoint. Crash loop started during the Railway/GCP incident (~22:21 UTC on May 19). My app service (portal-financeiro) is still serving cached pages, but all DB-bound endpoints are failing.

Postgres-staging using the same image runs fine — only production is affected.

I have a manual backup (May 20, 04:47 UTC) just in case, but I'd rather have you fix this without restoring from backup since my data on the volume should be intact. I have not redeployed Postgres or modified any settings — leaving it as-is for you to inspect.

This is impacting production for my business. Can you help urgently?

Thanks,

Guilherme

Solved

6 Replies

Railway
BOT

22 days ago

Your Postgres production service is confirmed in a crashed state, and this is related to the ongoing Railway service disruption that began on May 19. The Image Registry is among the affected components, which explains the "catatonit: failed to exec pid1" error - the container image cannot properly load. Recovery is actively underway, and your volume data should remain intact. Please follow updates on the status page, and once the incident is fully resolved, a redeploy of your Postgres service should bring it back up.


Status changed to Awaiting User Response Railway 22 days ago


22 days ago

Thanks for reaching out. We sincerely apologize for the service disruption.

We're seeing recovery in our API, builds, and deployments. If your service is experiencing an issue, please try redeploying it. We'll publish a public postmortem once we're fully recovered.

For all customers, we’ll publish a detailed postmortem outlining what happened and the steps we’re taking to prevent similar incidents in the future. For Enterprise customers, service credits are covered under our SLA and will be reviewed as part of our post-incident process.


tj-nomoai
PRO

22 days ago

+1 — same exact issue on postgres-ssl:17 in europe-west4.

  • Project: 3d40829a-ee7a-4a10-b831-c3c0e9b77e97
  • Service: 0bee1202-eb7e-4114-b2b0-95b16de0a0d1
  • Replica: 3c35a3c7-55b1-4518-893e-a63a849d0e41

Same catatonit "failed to exec pid1" auto-restart loop since ~22:20 UTC May 19.

Volume mount succeeds, data intact. Thanks for the Image Registry diagnosis —

will follow this thread for updates.


Status changed to Awaiting Railway Response Railway 22 days ago


Railway
BOT

22 days ago

Your service in europe-west4 is affected by the same ongoing disruption, and your case is being tracked alongside the original report here. Since the latest incident update confirms deploys are now possible again, please try redeploying your Postgres service - this will pull a fresh image from the registry and should resolve the entrypoint failure. Your volume data remains intact through redeploys. GCP-hosted workloads may still see intermittent issues while full recovery completes, so if the first redeploy doesn't take, wait a few minutes and try again.


Status changed to Awaiting User Response Railway 22 days ago


tj-nomoai
PRO

22 days ago

Update: I'm back online at 12:07 AM Mountain Time.


Status changed to Awaiting Railway Response Railway 21 days ago


21 days ago

Glad to hear you're back up. Going to mark this as solved as i can see both of your workloads are back up. Feel free to re-open this up with any questions.


Status changed to Awaiting User Response Railway 21 days ago


Status changed to Solved mykal 21 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...