Production Postgres WAL corrupted after May 19 GCP suspension — needs pg_resetwal
cristiancampodj-glitch
PROOP

23 days ago

Following the May 19 incident (https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage), my production Postgres in project GRUPO PAROLE is stuck in a crash loop and cannot recover on its own.

Service details

Project: GRUPO PAROLE

Environment: production

Service: Postgres (postgres-production-79c29.up.railway.app)

Region: EU West (Amsterdam)

Image: ghcr.io/railwayapp-templates/postgres-ssl:17.9

Volume: vol_5mp7b6dti6q93cdw

What's happening

Container starts cleanly, but Postgres aborts startup with:

LOG: database system was interrupted; last known up at 2026-05-20 08:19:00 UTC

LOG: unexpected pageaddr 0/E304000 in WAL segment 000000010000000000000010, LSN 0/10304000, offset 3162112

LOG: invalid checkpoint record

PANIC: could not locate a valid checkpoint record at 0/10303A40

The WAL segment is truncated — almost certainly a torn write when GCP suspended the account mid-fsync (my service received fast shutdown at 2026-05-19 22:22:15 UTC, two minutes after the suspension started per your post-mortem).

Data files appear intact; only the last WAL segment is bad.

What I need (in order of preference)

A snapshot of volume vol_5mp7b6dti6q93cdw taken before any further action.

Then run pg_resetwal -f /var/lib/postgresql/data/pgdata inside the container, or give me a one-off way to run it (temporary start command, shell, etc.).

Bring the service back online so I can pg_dump and verify integrity.

This is production. I have not modified any settings, image, or volume since the incident, exactly so you have a clean state to work from.

I also have a second affected Postgres in the same project (develop environment) showing the same pattern — happy to share details if helpful.

Thanks for your help.

Solved

1 Replies

Railway
BOT

23 days ago

We can confirm your production Postgres in GRUPO PAROLE is crash-looping with WAL corruption, directly caused by the May 19 service disruption. Before attempting recovery, you can create a manual backup of your volume from the Backups tab on your Postgres service. To run pg_resetwal, you can temporarily override the start command in your service settings to something like /bin/sh -c "pg_resetwal -f /var/lib/postgresql/data/pgdata && exec docker-entrypoint.sh postgres", then redeploy - since this is an image-based service, the start command overrides the entrypoint. Once Postgres is back up, restore the original start command (clear the override) and redeploy.


Status changed to Awaiting User Response Railway 23 days ago


Status changed to Solved cristiancampodj-glitch 23 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...