ecovery for a production Postgres service
vosamb18-creator
PROOP

23 days ago

Hi team,

I need urgent help with point-in-time volume recovery for a

production Postgres service.

Project: lume-pulse

Service name: "Postgres" (the old one, currently in crash loop)

Region: EU West

Volume: postgres-volume

Timeline:

  • 2026-05-19 22:12 UTC — last successful checkpoint (clean state)

  • 2026-05-20 04:45 UTC — container started crash loop with error:

    "ERROR (catatonit:2): failed to exec pid1: No such file or directory"

  • Restart and Redeploy do not resolve the issue.

What I already tried (with Railway Agent's help):

  • Created a new Postgres service "Postgres-6Ylg" — works fine.

  • Spun up a recovery container with sleep infinity to attempt

    pg_dump from the old service — TCP connection fails because

    the old container is in crash loop and never starts accepting

    connections.

  • Old volume is intact (still mounted on the crashed service)

    but inaccessible because the Postgres container itself won't

    start.

What I need:

  • Please attempt a filesystem-level inspection of postgres-volume

    on the old service. If possible, restore the data directory to a

    state where a Postgres container can start in read-only mode so

    I can dump the data via pg_dump.

  • Alternatively, if Railway has any infrastructure-level snapshots

    of the volume around 2026-05-19 22:00 UTC, please restore that

    snapshot.

I have a production bot ("lume-pulse" service) that depends on this

data. I've already moved it to a fresh empty database to restore

service, but I'd very much like to recover the lost user data,

work sessions, reports, and active video projects if possible.

I will NOT delete the old service or its volume until you reply.

Thank you!

Solved

1 Replies

Railway
BOT

23 days ago

Your Postgres crash loop began during our May 19-20 service disruption, which affected EU West and is now resolved. The "catatonit: failed to exec pid1" error is a container runtime failure from the outage, not a data issue - your volume data should still be intact since volumes persist independently of the container. Your last deployment on that Postgres service dates to 2026-04-23, so if your earlier restart/redeploy attempts happened during the outage window, they would have hit the same infrastructure problem. Please try a fresh redeploy now that the incident is resolved - the new container should pull a clean image and mount your existing volume with its data. Regarding your specific requests: we do not offer filesystem-level inspection of volumes or infrastructure-level snapshots beyond what is available through the volume backups feature, which must be configured in advance. If the fresh redeploy does not resolve the crash loop, please check the Backups tab on your Postgres service for any existing backups you can restore from.


Status changed to Awaiting User Response Railway 23 days ago


Railway
BOT

16 days ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 16 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...