a month ago
"My Postgres deployment is crash-looping with 'failed to exec pid1: No such file or directory' errors on volume vol_6u4cyts513la3sfw. The volume appears stuck on a machine affected by the May 19th GCP incident. Please migrate this volume to a healthy host. Project: Lens League Apex."
18 Replies
a month ago
My also wtf is happening??
a month ago
Your Postgres service is confirmed in a CRASHED state, and this is related to the ongoing service disruption. We are actively auto-redeploying services detected as unhealthy, which remounts volumes onto healthy hosts. If your service hasn't recovered automatically yet, please trigger a redeploy from the dashboard - this will land the volume on a new, healthy machine. If the service remains stuck after a manual redeploy, let us know and we will investigate the volume directly.
Status changed to Awaiting User Response Railway • about 1 month ago
Railway
Your Postgres service is confirmed in a CRASHED state, and this is related to the [ongoing service disruption](https://status.railway.com/incident/I23M92U0). We are actively auto-redeploying services detected as unhealthy, which remounts volumes onto healthy hosts. If your service hasn't recovered automatically yet, please trigger a redeploy from the dashboard - this will land the volume on a new, healthy machine. If the service remains stuck after a manual redeploy, let us know and we will investigate the volume directly.
a month ago
I’ve restarted it a couple of times, but it’s still the same.
Status changed to Awaiting Railway Response Railway • about 1 month ago
Railway
Your Postgres service is confirmed in a CRASHED state, and this is related to the [ongoing service disruption](https://status.railway.com/incident/I23M92U0). We are actively auto-redeploying services detected as unhealthy, which remounts volumes onto healthy hosts. If your service hasn't recovered automatically yet, please trigger a redeploy from the dashboard - this will land the volume on a new, healthy machine. If the service remains stuck after a manual redeploy, let us know and we will investigate the volume directly.
a month ago
"Manual redeploy attempted multiple times — same error persists. Volume vol_6u4cyts513la3sfw is still crash-looping with 'failed to exec pid1: No such file or directory'. Automatic recovery has not worked. Please investigate the volume directly as mentioned."
sreelivinglens
"Manual redeploy attempted multiple times — same error persists. Volume vol_6u4cyts513la3sfw is still crash-looping with 'failed to exec pid1: No such file or directory'. Automatic recovery has not worked. Please investigate the volume directly as mentioned."
a month ago
Of course there won’t be any discount or anything…
sreelivinglens
"Manual redeploy attempted multiple times — same error persists. Volume vol_6u4cyts513la3sfw is still crash-looping with 'failed to exec pid1: No such file or directory'. Automatic recovery has not worked. Please investigate the volume directly as mentioned."
a month ago
This happened at the worst possible time, right in the middle of a major project. This is horrible…
stokry
This happened at the worst possible time, right in the middle of a major project. This is horrible…
a month ago
mine too
a month ago
It’s taking too long to fix this…
a month ago
Thanks for reaching out. We sincerely apologize for the service disruption.
We're seeing recovery in our API, builds, and deployments. If your service is experiencing an issue, please try redeploying it. We'll publish a public postmortem once we're fully recovered.
You can follow updates here: https://status.railway.com
Status changed to Awaiting User Response Railway • about 1 month ago
brody
Thanks for reaching out. We sincerely apologize for the service disruption. We're seeing recovery in our API, builds, and deployments. If your service is experiencing an issue, please try redeploying it. We'll publish a public postmortem once we're fully recovered. You can follow updates here: https://status.railway.com
a month ago
I’ve restarted it a couple of times, but it’s still the same. Issue with Postgres!!!
Status changed to Awaiting Railway Response Railway • about 1 month ago
stokry
I’ve restarted it a couple of times, but it’s still the same. Issue with Postgres!!!
a month ago
Please open your own thread.
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
hi its been 11 hrs now, what do we do? i need to show my site for KYC compliance.
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
"Still crash-looping as of 7:41 PM IST May 20. Your incident report says resolved at 07:58 UTC but our Postgres volume vol_6u4cyts513la3sfw has not recovered. Same error persisting for 17+ hours. Need urgent manual intervention on this specific volume."
a month ago
Hey, your Postgres is back online and your data is intact. It went through automatic recovery and is accepting connections now.
One thing I noticed: your agile-quietude service is still showing offline. You may need to redeploy that one to bring it back. Sorry about the long downtime, especially with the timing around your KYC compliance deadline.
Status changed to Awaiting User Response Railway • about 1 month ago
chandrika
Hey, your Postgres is back online and your data is intact. It went through automatic recovery and is accepting connections now. One thing I noticed: your agile-quietude service is still showing offline. You may need to redeploy that one to bring it back. Sorry about the long downtime, especially with the timing around your KYC compliance deadline.
a month ago
Thank you, I redeployed and it became functional. What do we do when in future this happens again? Because when site is live, and specially there are timeline windows, payments made, TAT to be observed, we may see ourselves getting sued. Require Human response to this
Also you can close the ticket after your response and solution
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
Glad everything is back up. That's a fair question. A few things that would help protect you in the future:
- Volume backups — set up a daily schedule so you always have a restore point
- Point-in-Time Recovery — continuously archives WAL so you can restore to any timestamp
- Health checks — helps Railway detect and recover unhealthy services faster
For workloads where downtime has legal or compliance consequences, these layers make a real difference. We're also taking steps on our end to prevent a repeat of the May 19 incident, which you can read about in the postmortem.
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
Postgres won't start after May 19 outage. Tried version toggling (16↔17) but still getting:
unrecognized configuration parameter "autovacuum_worker_slots" in postgresql.conf line 687
Volume mounts fine, data is intact. Need the config file reset/regenerated. No backups available.
Status changed to Awaiting Railway Response Railway • about 1 month ago
a month ago
Thank you Chandrika. We've noted all three recommendations — Volume backups, Point-in-Time Recovery, and Health checks — and will be setting these up. We'll review the postmortem as well.
For context, our platform handles photography audit submissions with payment transactions and mentor session bookings, so downtime does carry compliance implications. We appreciate Railway taking steps to prevent a repeat of the May 19 incident.
Everything is stable now. You can close this ticket.
