Postgres DB crashed and is down, redeployment doesn't help - "PANIC: could not locate a valid checkpoint record"
askara
PROOP

8 months ago

Hello,

Postgres db has crashed and is currently not online.

Restarting and Redeploying doesn't help.

Restoring from backup which is 6 days old is not acceptable since there is a valuable data within those 6 days.

Any help would be much appreciated here. Maybe restoring from more recent backup if you have it internally?

Thank you in advance!

LOG:

2025-05-30 07:12:44.289 UTC [28] LOG: database system was interrupted; last known up at 2025-05-29 20:07:55 UTC

May 30 09:12:45

2025-05-30 07:12:44.321 UTC [28] LOG: invalid resource manager ID in checkpoint record

May 30 09:12:45

2025-05-30 07:12:44.321 UTC [28] PANIC: could not locate a valid checkpoint record

May 30 09:12:45

2025-05-30 07:12:44.321 UTC [5] LOG: startup process (PID 28) was terminated by signal 6: Aborted

May 30 09:12:45

2025-05-30 07:12:44.321 UTC [5] LOG: aborting startup due to startup process failure

May 30 09:12:45

2025-05-30 07:12:44.322 UTC [5] LOG: database system is shut down

Solved

4 Replies

askara
PROOP

8 months ago

UPDATE: I have restored 6 days old backup (2025-05-24 at 01:12 UTC). This is not ideal, because there is now 6 days of missing data, and it is my fault I have set weekly instead of daily backups. I would appreciate if you would by any chance had a internal backup that is more recent that you could restore my db into. Please don't proceed with fixes and restorations that would cause service interruptions after 8:30 AM PST, since we have important traffic on our apps after that time. So if it can be done by that time, that would great, if not, please just leave as it is for now.

Thanks!


Unfortunately we don’t have user accessible backups to restore.


Status changed to Awaiting User Response Railway 9 months ago


angelo-railway

Unfortunately we don’t have user accessible backups to restore.

askara
PROOP

8 months ago

Thank you for your reply.

Is it possible to get any logical or physical backup, export, or data directory from the corrupted database state immediately before it was restored with the 6-day-old backup, even if it is corrupted or incomplete.

If it’s possible for your engineering team to provide any of the following from the previous (crashed) instance:

  • The raw data directory (even if it’s corrupted)

  • A logical dump or partial export from the failed database

  • Any WAL (write-ahead log) files or other recovery artifacts

I understand this data may be incomplete, inconsistent, or not usable for direct restore. My goal is to attempt my own recovery or forensic analysis, and I accept all risks.

Additionally, I would appreciate any information or analysis you can provide regarding the cause of the corruption and crash.

Thank you for your help.


Status changed to Awaiting Railway Response Railway 8 months ago


We can't do the following on the above, you do have access to the direct container by the ability of SSHing into the DB yourself. However, your restore means that you restored the snapshot overwriting all historical data.

Although I empathize with you, there is nothing more we can do here. I apologize.

Best,
Angelo


Status changed to Awaiting User Response Railway 8 months ago


Railway
BOT

6 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 6 months ago


Loading...