Postgres service crash-looping with catatonit pid1 error

jmaie31882

HOBBYOP

a month ago

Hi team,

My Postgres service has been crash-looping since approximately 08:56 GMT+1 on May 20 2026. Symptoms appear identical to the wave of threads posted in the last 15-20 minutes following the pinned "GCP Suspension Outage: May 19th 2026" post (e.g. "Postgres database fails to start - catatonit pid1 error - data recovery needed", "Postgres service crashes immediately: failed to exec pid1", "Production Postgres WAL corrupted after May 19 GCP suspension"). I believe my service is caught up in the same incident.

Description of the issue:

Postgres deployment ran fine for 2 weeks, then began crash-looping today around 08:56 GMT+1
Each container start fails within seconds at the init layer
Downstream Next.js app (assetflow) cannot connect and returns Prisma P1001 "Can't reach database server at postgres.railway.internal:5432"
Free plan, so no Pro-tier automated backups exist on the volume

Error messages (Postgres deploy logs, tight loop):

ERROR (catatonit:2): failed to exec pid1: No such file or directory

Mounting volume on: /var/lib/containers/railwayapp/bind-mounts/af5f7e1e-9f76-4d40-ac7c-12052770ba45/vol_mlb1fd7em4gay133

ERROR (catatonit:2): failed to exec pid1: No such file or directory

[pattern repeats every ~1 second indefinitely]

This indicates catatonit cannot exec the container entrypoint, so Postgres itself never starts and never reads the volume. Given the broader incident, my read is that the underlying issue is volume / WAL state inherited from yesterday's GCP suspension, not a fault in my project's configuration.

What I've already tried:

Restart deployment — crashed within 7 seconds with the same catatonit error
Redeploy (fresh image pull) — same result, same error

I have NOT touched the image tag or the volume because I have no backups and don't want to risk further data loss.

Service details:

Project: Asset Flow
Project ID: 7c852cfe-af90-4023-87b0-b62c08b47a9c
Environment: production (92dc88b2-017c-4e9f-bd3d-9d7153546d90)
Postgres service ID: 5380526f-654a-408a-9f35-3773fd0aed3c
Latest crashed Postgres deployment: a677ca4a
Image: ghcr.io/railwayapp-templates/postgres-ssl:18
Volume: postgres-volume (contains my only copy of production data)
Public TCP proxy: switchyard.proxy.rlwy.net:45069 → 5432
Downstream affected service: assetflow (assetflow.scape.com), also Crashed

Ask:

Could the team confirm this is part of the May 19 GCP outage cascade?
Is there a recovery path that preserves the data on postgres-volume (e.g. pg_resetwal, manual mount, or a temporary recovery container)?
I'm happy to follow any instructions or grant access as needed — please advise on next steps.

Thanks very much.

Solved

2 Replies

Status changed to Awaiting Railway Response Railway • about 1 month ago

josuetapianefrologo-cmd

PRO

a month ago

Same issue here - my Postgres service has been in crash loop since

the outage. Logs show "catatonit failed to exec pid1: No such file

or directory" repeatedly. Region: europe-west4-drams3a. Hobby plan.

The container won't restart even after the GCP issue was resolved.

Has anyone been able to recover their volume?

sam-a

EMPLOYEE

a month ago

Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.

It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.

You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY

If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c

Feel free to respond if your question has not been addressed.

Status changed to Awaiting User Response Railway • about 1 month ago

Railway

BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway • about 1 month ago

Welcome!