Postgres crash-looping since GCP outage - volume needs moving (Pro)
rowgregory
PROOP

a month ago

Pro plan customer. My Postgres service has been crash-looping since the May 19 GCP incident. The platform incident resolved May 20 21:35 UTC, but my service has not recovered ~14h+ later. Multiple redeploys go green for ~1 second then crash again.

Per the recovery FAQ, this matches the case where "the volume may need to be moved to a healthy node."

Project ID: 53e9dfc3-52a2-46da-9cb7-74c153164e91

Service ID: c6bb0b8c-f187-4bb9-9724-d90633427a08

Image: ghcr.io/railwayapp-templates/postgres-ssl:16

Logs show collation version mismatch warnings (2.36 → 2.41) then connection drops. psql connection attempts get "server closed the connection unexpectedly."

Also posted on the recovery thread yesterday with no response yet: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c

Site is up but member logins are broken since the DB is down. Any help appreciated.

Solved

3 Replies

Try this:

  1. Set the start command of your Postgres to be sleep infinity and redeploy
  2. After deployment, SSH into your service (right click the service and select Copy SSH Command) (You'll need Railway CLI installed locally)
  3. Run su - postgres -c "/usr/lib/postgresql//bin/pg_ctl start -D /var/lib/postgresql/data/pgdata" where `` is the version of your Postgres image (16, 18, etc)
  4. Run psql
  5. Run:
REINDEX DATABASE railway;
ALTER DATABASE railway REFRESH COLLATION VERSION;
  1. Exit the terminal
  2. Remove the custom start command and redeploy your database

rowgregory
PROOP

a month ago

Worked perfectly, thanks @pepper! REINDEX + REFRESH ran clean, DB is back online with no warnings. Login confirmed working. Really appreciate the fast turnaround.



Status changed to Solved 0x5b62656e5d about 1 month ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...