a month ago
Pro plan customer. My Postgres service has been crash-looping since the May 19 GCP incident. The platform incident resolved May 20 21:35 UTC, but my service has not recovered ~14h+ later. Multiple redeploys go green for ~1 second then crash again.
Per the recovery FAQ, this matches the case where "the volume may need to be moved to a healthy node."
Project ID: 53e9dfc3-52a2-46da-9cb7-74c153164e91
Service ID: c6bb0b8c-f187-4bb9-9724-d90633427a08
Image: ghcr.io/railwayapp-templates/postgres-ssl:16
Logs show collation version mismatch warnings (2.36 → 2.41) then connection drops. psql connection attempts get "server closed the connection unexpectedly."
Also posted on the recovery thread yesterday with no response yet: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c
Site is up but member logins are broken since the DB is down. Any help appreciated.
3 Replies
a month ago
Try this:
- Set the start command of your Postgres to be
sleep infinityand redeploy - After deployment, SSH into your service (right click the service and select Copy SSH Command) (You'll need Railway CLI installed locally)
- Run
su - postgres -c "/usr/lib/postgresql//bin/pg_ctl start -D /var/lib/postgresql/data/pgdata"where `` is the version of your Postgres image (16, 18, etc) - Run
psql - Run:
REINDEX DATABASE railway;
ALTER DATABASE railway REFRESH COLLATION VERSION;- Exit the terminal
- Remove the custom start command and redeploy your database
Worked perfectly, thanks @pepper! REINDEX + REFRESH ran clean, DB is back online with no warnings. Login confirmed working. Really appreciate the fast turnaround.
a month ago
np
Status changed to Solved 0x5b62656e5d • about 1 month ago