Production Postgres WAL corrupted after May 19 GCP suspension — needs pg_resetwal

Question

Following the May 19 incident (https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage), my production Postgres in project GRUPO PAROLE is stuck in a crash loop and cannot recover on its own.

Service details

Project: GRUPO PAROLE

Environment: production

Service: Postgres (postgres-production-79c29.up.railway.app)

Region: EU West (Amsterdam)

Image: ghcr.io/railwayapp-templates/postgres-ssl:17.9

Volume: vol_5mp7b6dti6q93cdw

What's happening

Container starts cleanly, but Postgres aborts startup with:

LOG:  database system was interrupted; last known up at 2026-05-20 08:19:00 UTC

LOG:  unexpected pageaddr 0/E304000 in WAL segment 000000010000000000000010, LSN 0/10304000, offset 3162112

LOG:  invalid checkpoint record

PANIC: could not locate a valid checkpoint record at 0/10303A40

The WAL segment is truncated — almost certainly a torn write when GCP suspended the account mid-fsync (my service received fast shutdown at 2026-05-19 22:22:15 UTC, two minutes after the suspension started per your post-mortem).

Data files appear intact; only the last WAL segment is bad.

What I need (in order of preference)

A snapshot of volume vol_5mp7b6dti6q93cdw taken before any further action.

Then run pg_resetwal -f /var/lib/postgresql/data/pgdata inside the container, or give me a one-off way to run it (temporary start command, shell, etc.).

Bring the service back online so I can pg_dump and verify integrity.

This is production. I have not modified any settings, image, or volume since the incident, exactly so you have a clean state to work from.

I also have a second affected Postgres in the same project (develop environment) showing the same pattern — happy to share details if helpful.

Thanks for your help.