PostgreSQL service stuck in crash loop — "failed to exec pid1: No such file or directory" after unexpected shutdown

mengenmetehan

HOBBYOP

a month ago

Hi,

My PostgreSQL service in the sulama-app project has been stuck in a crash loop since May 20, 04:24 UTC. I have not made any deployments or configuration changes — it was running fine until it went down on its own.

The logs show this repeating pattern:

Volume mounts successfully
Immediately followed by: ERROR (catatonit:2): failed to exec pid1: No such file or directory
This cycle repeats indefinitely

Before the crash, the backend was logging HikariPool connection failures (Connection is not available, total=0, active=0, idle=0, waiting=0), suggesting the database went unreachable around 02:19 UTC.

What I've tried:

Restarting the PostgreSQL service multiple times via the dashboard
Checked Settings — no custom start command or pre-deploy command is set

The service keeps crashing on every restart attempt. It appears to be a volume or image-level issue rather than anything in my configuration.

Could you please look into this? The data on the volume is important to me, so I'd prefer not to delete and recreate the service if possible.

Thank you.

Awaiting Railway Response

2 Replies

Status changed to Awaiting Railway Response Railway • about 1 month ago

sam-a

EMPLOYEE

a month ago

Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.

It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.

You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY

If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c

Feel free to respond if your question has not been addressed.

Status changed to Awaiting User Response Railway • about 1 month ago

mengenmetehan

HOBBYOP

a month ago

Hi,

Thank you for the update. I understand the scale of the incident, but my PostgreSQL service is still stuck in the same crash loop after multiple redeploys.

Project ID: ab99ac93-a328-4eef-b666-bdaff2b569b5

The issue:

Volume mounts successfully
Immediately crashes with: ERROR (catatonit:2): failed to exec pid1: No such file or directory
This has been repeating since May 20, 04:24 UTC
Multiple redeploys via the dashboard have not resolved it

Based on the recovery thread, this looks like the volume needs to be moved to a healthy node. Could someone please take a look?

The data on the volume is important — I'd like to avoid deleting and recreating the service if possible.

Thank you.

Status changed to Awaiting Railway Response Railway • about 1 month ago

Welcome!