PostgreSQL crashed and is stuck in recovery loop, risk of data loss
hive-health-dev
HOBBYOP

2 months ago

CURRENT STATE: - Database is offline and unable to start - Volume is mounted at /var/lib/postgresql/data - Disk usage: ~0.48 GB (volume appears to be ~1 GB default) - No automatic backups configured - Data is at risk of permanent loss REQUEST: Please investigate the corrupted WAL files and attempt recovery before any data is deleted. We have not reset the database and are awaiting support intervention.

Solved

5 Replies

Railway
BOT

2 months ago

Your Postgres service is confirmed in a CRASHED state, and we were unable to retrieve any logs from the current deployment. We do not have the ability to access or repair WAL files on your volume directly. Since no backups are configured, we recommend creating a manual backup of the volume from the service's Backups tab in the dashboard before making any changes, as this will snapshot the current volume contents, and then you can attempt a fresh deployment or restore from that snapshot. More on backups here.


Status changed to Awaiting User Response Railway 2 months ago


Status changed to Awaiting Railway Response Railway 2 months ago


hive-health-dev
HOBBYOP

2 months ago

My Postgres volume (500MB) is full. PG completes WAL redo successfully (redo done at 4/4EFFFFA0) but crashes trying to write the post-recovery WAL segment. I've resized the volume in settings but the filesystem hasn't expanded. Can you either:

  1. Expand the filesystem to match the new volume size, OR
  2. Shell into the container and delete old WAL segments: ls -la /var/lib/postgresql/data/pg_wal/ — any files before 4/419CECD0 (the redo start LSN) are safe to remove

No data corruption — WAL replay completes successfully every time. Just needs ~50MB of free space to write the recovery checkpoint.


hive-health-dev
HOBBYOP

2 months ago

I can't see the volume in the UI. Could you please check and expand it to at least 10 GB to prevent the disk space issue from happening again?


2 months ago

Hey, the volume resize you attempted didn't fully complete on our end, which left your database stuck at the original 500 MB. We've corrected this and redeployed your Postgres service. You should now have ~10 GB of available disk space, giving PostgreSQL plenty of room to complete WAL recovery and resume normal operation.

We've also shipped a fix to prevent this from happening on future volume resizes.


Status changed to Awaiting User Response Railway 2 months ago


brody

Hey, the volume resize you attempted didn't fully complete on our end, which left your database stuck at the original 500 MB. We've corrected this and redeployed your Postgres service. You should now have ~10 GB of available disk space, giving PostgreSQL plenty of room to complete WAL recovery and resume normal operation. We've also shipped a fix to prevent this from happening on future volume resizes.

hive-health-dev
HOBBYOP

2 months ago

Thank you so much for your speedy help, Brody!


Status changed to Awaiting Railway Response Railway 2 months ago


Status changed to Solved brody 2 months ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...