Postgres DB is crashed since your outage, will not come back up after restart

truerdave

PROOP

2 months ago

Vault staging Postgres was alive and checkpointing normally through 2026-05-19 22:01 UTC.

At 2026-05-19 22:21:32 UTC, Postgres received a fast shutdown request:

received fast shutdown request

aborting any active transactions

terminating connection due to administrator command

database system is shut down

WAL was shut down unexpectedly

After that, every restart fails before Postgres even starts:

Mounting volume on: .../vol_f4l6v1wsl5j9kyk2

ERROR (catatonit:2): failed to exec pid1: No such file or directory

Gatekeeper’s deploy logs confirm it is just downstream of that failure. Alembic tries to connect to postgres-2b29.railway.internal, gets a socket, then the server closes the connection unexpectedly because the DB service is not successfully running.

It does not look like normal Postgres data corruption from the logs we can see. The strongest signal is a Railway/container startup failure after shutdown: catatonit cannot exec PID 1, which usually means the service image/start command/runtime config is broken or Railway’s managed Postgres container is in a bad platform state.

I did not see classic disk-full evidence in the pulled logs, like No space left on device, WAL write failure, checkpoint failure, or Postgres PANIC from storage exhaustion. The SSL/startup-packet logs look like noise/probes/mismatched clients and predate the outage.

Solved

2 Replies

Status changed to Awaiting Railway Response Railway • about 2 months ago

mykal

EMPLOYEE

2 months ago

Your diagnosis is correct - this is a container startup failure, not data corruption. The catatonit pid1 error is caused by a stale container image after the shutdown event. Your volume data is intact.

To fix this, open the Postgres service, press Cmd+K (or Ctrl+K) to open the command palette, and select "Redeploy source image". This re-pulls a fresh image and should resolve the crash loop. A normal redeploy from the three-dot menu will not work here because it reuses the cached image.

Status changed to Awaiting User Response Railway • about 2 months ago

truerdave

PROOP

2 months ago

your approach was the fix thanks

Status changed to Awaiting Railway Response Railway • about 2 months ago

Status changed to Solved Railway • about 2 months ago

Welcome!