Issue Postgres crashing
sreelivinglens
HOBBYOP

a month ago

"My Postgres deployment is crash-looping with 'failed to exec pid1: No such file or directory' errors on volume vol_6u4cyts513la3sfw. The volume appears stuck on a machine affected by the May 19th GCP incident. Please migrate this volume to a healthy host. Project: Lens League Apex."

Awaiting Railway Response

18 Replies

stokry
HOBBY

a month ago

My also wtf is happening??


Railway
BOT

a month ago

Your Postgres service is confirmed in a CRASHED state, and this is related to the ongoing service disruption. We are actively auto-redeploying services detected as unhealthy, which remounts volumes onto healthy hosts. If your service hasn't recovered automatically yet, please trigger a redeploy from the dashboard - this will land the volume on a new, healthy machine. If the service remains stuck after a manual redeploy, let us know and we will investigate the volume directly.


Status changed to Awaiting User Response Railway about 1 month ago


Railway

Your Postgres service is confirmed in a CRASHED state, and this is related to the [ongoing service disruption](https://status.railway.com/incident/I23M92U0). We are actively auto-redeploying services detected as unhealthy, which remounts volumes onto healthy hosts. If your service hasn't recovered automatically yet, please trigger a redeploy from the dashboard - this will land the volume on a new, healthy machine. If the service remains stuck after a manual redeploy, let us know and we will investigate the volume directly.

stokry
HOBBY

a month ago

I’ve restarted it a couple of times, but it’s still the same.


Status changed to Awaiting Railway Response Railway about 1 month ago


Railway

Your Postgres service is confirmed in a CRASHED state, and this is related to the [ongoing service disruption](https://status.railway.com/incident/I23M92U0). We are actively auto-redeploying services detected as unhealthy, which remounts volumes onto healthy hosts. If your service hasn't recovered automatically yet, please trigger a redeploy from the dashboard - this will land the volume on a new, healthy machine. If the service remains stuck after a manual redeploy, let us know and we will investigate the volume directly.

sreelivinglens
HOBBYOP

a month ago

"Manual redeploy attempted multiple times — same error persists. Volume vol_6u4cyts513la3sfw is still crash-looping with 'failed to exec pid1: No such file or directory'. Automatic recovery has not worked. Please investigate the volume directly as mentioned."


sreelivinglens

"Manual redeploy attempted multiple times — same error persists. Volume vol_6u4cyts513la3sfw is still crash-looping with 'failed to exec pid1: No such file or directory'. Automatic recovery has not worked. Please investigate the volume directly as mentioned."

stokry
HOBBY

a month ago

Of course there won’t be any discount or anything…


sreelivinglens

"Manual redeploy attempted multiple times — same error persists. Volume vol_6u4cyts513la3sfw is still crash-looping with 'failed to exec pid1: No such file or directory'. Automatic recovery has not worked. Please investigate the volume directly as mentioned."

stokry
HOBBY

a month ago

This happened at the worst possible time, right in the middle of a major project. This is horrible…


stokry

This happened at the worst possible time, right in the middle of a major project. This is horrible…

sreelivinglens
HOBBYOP

a month ago

mine too


stokry
HOBBY

a month ago

It’s taking too long to fix this…


a month ago

Thanks for reaching out. We sincerely apologize for the service disruption.

We're seeing recovery in our API, builds, and deployments. If your service is experiencing an issue, please try redeploying it. We'll publish a public postmortem once we're fully recovered.

You can follow updates here: https://status.railway.com


Status changed to Awaiting User Response Railway about 1 month ago


brody

Thanks for reaching out. We sincerely apologize for the service disruption. We're seeing recovery in our API, builds, and deployments. If your service is experiencing an issue, please try redeploying it. We'll publish a public postmortem once we're fully recovered. You can follow updates here: https://status.railway.com

stokry
HOBBY

a month ago

I’ve restarted it a couple of times, but it’s still the same. Issue with Postgres!!!


Status changed to Awaiting Railway Response Railway about 1 month ago


stokry

I’ve restarted it a couple of times, but it’s still the same. Issue with Postgres!!!

a month ago

Please open your own thread.


Status changed to Awaiting User Response Railway about 1 month ago


sreelivinglens
HOBBYOP

a month ago

hi its been 11 hrs now, what do we do? i need to show my site for KYC compliance.


Status changed to Awaiting Railway Response Railway about 1 month ago


sreelivinglens
HOBBYOP

a month ago

"Still crash-looping as of 7:41 PM IST May 20. Your incident report says resolved at 07:58 UTC but our Postgres volume vol_6u4cyts513la3sfw has not recovered. Same error persisting for 17+ hours. Need urgent manual intervention on this specific volume."


chandrika
EMPLOYEE

a month ago

Hey, your Postgres is back online and your data is intact. It went through automatic recovery and is accepting connections now.

One thing I noticed: your agile-quietude service is still showing offline. You may need to redeploy that one to bring it back. Sorry about the long downtime, especially with the timing around your KYC compliance deadline.


Status changed to Awaiting User Response Railway about 1 month ago


chandrika

Hey, your Postgres is back online and your data is intact. It went through automatic recovery and is accepting connections now. One thing I noticed: your agile-quietude service is still showing offline. You may need to redeploy that one to bring it back. Sorry about the long downtime, especially with the timing around your KYC compliance deadline.

sreelivinglens
HOBBYOP

a month ago

Thank you, I redeployed and it became functional. What do we do when in future this happens again? Because when site is live, and specially there are timeline windows, payments made, TAT to be observed, we may see ourselves getting sued. Require Human response to this

Also you can close the ticket after your response and solution


Status changed to Awaiting Railway Response Railway about 1 month ago


chandrika
EMPLOYEE

a month ago

Glad everything is back up. That's a fair question. A few things that would help protect you in the future:

  • Volume backups — set up a daily schedule so you always have a restore point
  • Point-in-Time Recovery — continuously archives WAL so you can restore to any timestamp
  • Health checks — helps Railway detect and recover unhealthy services faster

For workloads where downtime has legal or compliance consequences, these layers make a real difference. We're also taking steps on our end to prevent a repeat of the May 19 incident, which you can read about in the postmortem.


Status changed to Awaiting User Response Railway about 1 month ago


mattheytony
PRO

a month ago

Postgres won't start after May 19 outage. Tried version toggling (16↔17) but still getting:

unrecognized configuration parameter "autovacuum_worker_slots" in postgresql.conf line 687

Volume mounts fine, data is intact. Need the config file reset/regenerated. No backups available.


Status changed to Awaiting Railway Response Railway about 1 month ago


sreelivinglens
HOBBYOP

a month ago

Thank you Chandrika. We've noted all three recommendations — Volume backups, Point-in-Time Recovery, and Health checks — and will be setting these up. We'll review the postmortem as well.

For context, our platform handles photography audit submissions with payment transactions and mentor session bookings, so downtime does carry compliance implications. We appreciate Railway taking steps to prevent a repeat of the May 19 incident.

Everything is stable now. You can close this ticket.


Welcome!

Sign in to your Railway account to join the conversation.

Loading...