Pro] Postgres crash loop: catatonit failed to exec pid1

guilhermepresas

PROOP

a month ago

Hi Railway Team,

I just upgraded to Pro plan today.

My Postgres deployment on production environment (project: jubilant-commitment, service: Postgres, deployment ab34327d) is stuck in a crash loop with this error repeating every ~15 seconds:

ERROR (catatonit:2): failed to exec pid1: No such file or directory

The volume mounts successfully but the postgres-ssl:18 image fails to find the entrypoint. Crash loop started during the Railway/GCP incident (~22:21 UTC on May 19). My app service (portal-financeiro) is still serving cached pages, but all DB-bound endpoints are failing.

Postgres-staging using the same image runs fine — only production is affected.

I have a manual backup (May 20, 04:47 UTC) just in case, but I'd rather have you fix this without restoring from backup since my data on the volume should be intact. I have not redeployed Postgres or modified any settings — leaving it as-is for you to inspect.

This is impacting production for my business. Can you help urgently?

Thanks,

Guilherme

Solved

6 Replies

Railway

BOT

a month ago

Your Postgres production service is confirmed in a crashed state, and this is related to the ongoing Railway service disruption that began on May 19. The Image Registry is among the affected components, which explains the "catatonit: failed to exec pid1" error - the container image cannot properly load. Recovery is actively underway, and your volume data should remain intact. Please follow updates on the status page, and once the incident is fully resolved, a redeploy of your Postgres service should bring it back up.

Status changed to Awaiting User Response Railway • about 1 month ago

brody

EMPLOYEE

a month ago

Thanks for reaching out. We sincerely apologize for the service disruption.

We're seeing recovery in our API, builds, and deployments. If your service is experiencing an issue, please try redeploying it. We'll publish a public postmortem once we're fully recovered.

For all customers, we’ll publish a detailed postmortem outlining what happened and the steps we’re taking to prevent similar incidents in the future. For Enterprise customers, service credits are covered under our SLA and will be reviewed as part of our post-incident process.

tj-nomoai

PRO

a month ago

+1 — same exact issue on postgres-ssl:17 in europe-west4.

Project: 3d40829a-ee7a-4a10-b831-c3c0e9b77e97
Service: 0bee1202-eb7e-4114-b2b0-95b16de0a0d1
Replica: 3c35a3c7-55b1-4518-893e-a63a849d0e41

Same catatonit "failed to exec pid1" auto-restart loop since ~22:20 UTC May 19.

Volume mount succeeds, data intact. Thanks for the Image Registry diagnosis —

will follow this thread for updates.

Status changed to Awaiting Railway Response Railway • about 1 month ago

Railway

BOT

a month ago

Your service in europe-west4 is affected by the same ongoing disruption, and your case is being tracked alongside the original report here. Since the latest incident update confirms deploys are now possible again, please try redeploying your Postgres service - this will pull a fresh image from the registry and should resolve the entrypoint failure. Your volume data remains intact through redeploys. GCP-hosted workloads may still see intermittent issues while full recovery completes, so if the first redeploy doesn't take, wait a few minutes and try again.

Status changed to Awaiting User Response Railway • about 1 month ago

tj-nomoai

PRO

a month ago

Update: I'm back online at 12:07 AM Mountain Time.

Status changed to Awaiting Railway Response Railway • about 1 month ago

mykal

EMPLOYEE

a month ago

Glad to hear you're back up. Going to mark this as solved as i can see both of your workloads are back up. Feel free to re-open this up with any questions.

Status changed to Awaiting User Response Railway • about 1 month ago

Status changed to Solved mykal • about 1 month ago

Welcome!