23 days ago
Hi team,
I need urgent help with point-in-time volume recovery for a
production Postgres service.
Project: lume-pulse
Service name: "Postgres" (the old one, currently in crash loop)
Region: EU West
Volume: postgres-volume
Timeline:
-
2026-05-19 22:12 UTC — last successful checkpoint (clean state)
-
2026-05-20 04:45 UTC — container started crash loop with error:
"ERROR (catatonit:2): failed to exec pid1: No such file or directory"
-
Restart and Redeploy do not resolve the issue.
What I already tried (with Railway Agent's help):
-
Created a new Postgres service "Postgres-6Ylg" — works fine.
-
Spun up a recovery container with sleep infinity to attempt
pg_dump from the old service — TCP connection fails because
the old container is in crash loop and never starts accepting
connections.
-
Old volume is intact (still mounted on the crashed service)
but inaccessible because the Postgres container itself won't
start.
What I need:
-
Please attempt a filesystem-level inspection of postgres-volume
on the old service. If possible, restore the data directory to a
state where a Postgres container can start in read-only mode so
I can dump the data via pg_dump.
-
Alternatively, if Railway has any infrastructure-level snapshots
of the volume around 2026-05-19 22:00 UTC, please restore that
snapshot.
I have a production bot ("lume-pulse" service) that depends on this
data. I've already moved it to a fresh empty database to restore
service, but I'd very much like to recover the lost user data,
work sessions, reports, and active video projects if possible.
I will NOT delete the old service or its volume until you reply.
Thank you!
1 Replies
23 days ago
Your Postgres crash loop began during our May 19-20 service disruption, which affected EU West and is now resolved. The "catatonit: failed to exec pid1" error is a container runtime failure from the outage, not a data issue - your volume data should still be intact since volumes persist independently of the container. Your last deployment on that Postgres service dates to 2026-04-23, so if your earlier restart/redeploy attempts happened during the outage window, they would have hit the same infrastructure problem. Please try a fresh redeploy now that the incident is resolved - the new container should pull a clean image and mount your existing volume with its data. Regarding your specific requests: we do not offer filesystem-level inspection of volumes or infrastructure-level snapshots beyond what is available through the volume backups feature, which must be configured in advance. If the fresh redeploy does not resolve the crash loop, please check the Backups tab on your Postgres service for any existing backups you can restore from.
Status changed to Awaiting User Response Railway • 23 days ago
16 days ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • 16 days ago