23 days ago
Following the May 19 incident (https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage), my production Postgres in project GRUPO PAROLE is stuck in a crash loop and cannot recover on its own.
Service details
Project: GRUPO PAROLE
Environment: production
Service: Postgres (postgres-production-79c29.up.railway.app)
Region: EU West (Amsterdam)
Image: ghcr.io/railwayapp-templates/postgres-ssl:17.9
Volume: vol_5mp7b6dti6q93cdw
What's happening
Container starts cleanly, but Postgres aborts startup with:
LOG: database system was interrupted; last known up at 2026-05-20 08:19:00 UTC
LOG: unexpected pageaddr 0/E304000 in WAL segment 000000010000000000000010, LSN 0/10304000, offset 3162112
LOG: invalid checkpoint record
PANIC: could not locate a valid checkpoint record at 0/10303A40
The WAL segment is truncated — almost certainly a torn write when GCP suspended the account mid-fsync (my service received fast shutdown at 2026-05-19 22:22:15 UTC, two minutes after the suspension started per your post-mortem).
Data files appear intact; only the last WAL segment is bad.
What I need (in order of preference)
A snapshot of volume vol_5mp7b6dti6q93cdw taken before any further action.
Then run pg_resetwal -f /var/lib/postgresql/data/pgdata inside the container, or give me a one-off way to run it (temporary start command, shell, etc.).
Bring the service back online so I can pg_dump and verify integrity.
This is production. I have not modified any settings, image, or volume since the incident, exactly so you have a clean state to work from.
I also have a second affected Postgres in the same project (develop environment) showing the same pattern — happy to share details if helpful.
Thanks for your help.
1 Replies
23 days ago
We can confirm your production Postgres in GRUPO PAROLE is crash-looping with WAL corruption, directly caused by the May 19 service disruption. Before attempting recovery, you can create a manual backup of your volume from the Backups tab on your Postgres service. To run pg_resetwal, you can temporarily override the start command in your service settings to something like /bin/sh -c "pg_resetwal -f /var/lib/postgresql/data/pgdata && exec docker-entrypoint.sh postgres", then redeploy - since this is an image-based service, the start command overrides the entrypoint. Once Postgres is back up, restore the original start command (clear the override) and redeploy.
Status changed to Awaiting User Response Railway • 23 days ago
Status changed to Solved cristiancampodj-glitch • 23 days ago