Managed PostgreSQL service stuck in crashed state after restart in staging
rafafuenza
HOBBYOP

a month ago

We have a managed PostgreSQL service in our staging environment called Postgres_dev that is currently crashed and cannot be recovered with a normal redeploy.

What we are seeing:

The volume mounts successfully.

Earlier logs showed PostgreSQL starting normally and reaching database system is ready to accept connections.

After that, the service began failing repeatedly with:

ERROR (catatonit:2): failed to exec pid1: No such file or directory

The deployment now remains in Crashed state.

There are no backups available for this database.

Our application services (api_dev and worker_dev) depend on this database, so our staging environment is blocked.

Additional context:

This is happening while Railway is reporting a platform incident related to slow or paused builds/deployments.

However, this PostgreSQL service appears to be in a specifically broken state, not just delayed.

The service image shown is:

ghcr.io/railwayapp-templates/postgres-ssl:18

We need help determining:

Whether this PostgreSQL service/volume can be recovered.

Whether the container/image state is corrupted on Railway’s side.

Whether there is any way to restore access to the existing data, since no backups are available.

If useful, we can provide the full logs and service/environment identifiers.

Solved

2 Replies

Status changed to Awaiting Railway Response Railway about 1 month ago


sam-a
EMPLOYEE

a month ago

Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.

It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.

You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY

If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c

Feel free to respond if your question has not been addressed.


Status changed to Awaiting User Response Railway about 1 month ago


sam-a
EMPLOYEE

a month ago

Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.

It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.

You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY

If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c

Feel free to respond if your question has not been addressed.


Railway
BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 28 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...