25 days ago
Hello Railway Team,
My Postgres database has been stuck in a crash loop since the GCP outage on May 19th. The container mounts the volume, but fails to start due to corrupted file permissions on the persistent disk.
Here is the exact recurring error from the logs:
FATAL: private key file "/var/lib/postgresql/data/certs/server.key" must be owned by the database user or root
The logs also confirm that my data is still there and recognized by the system:
PostgreSQL Database directory appears to contain a database; Skipping initialization
What I have tried so far:
- Triggered manual Redeploys and Restarts multiple times. The redeploy succeeds, but the container immediately crashes again because the permission error is saved on the persistent volume itself.
- I cannot delete the service and recreate it because this database holds active user data for my Django application.
- I tried reaching out via Discord, but the Railway OAuth integration is currently failing for me, likely due to the ongoing rate-limiting issues.
Since I don't have root access to the underlying machine to run a chown command on the volume, could an engineer please manually fix the permissions for server.key on my persistent disk so the database can boot?
Project URL: Click here
Thank you for your hard work in restoring the platform!
5 Replies
Status changed to Awaiting Railway Response Railway • 25 days ago
25 days ago
Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.
It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.
You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY
If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c
Feel free to respond if your question has not been addressed.
Status changed to Awaiting User Response Railway • 25 days ago
25 days ago
Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. We've published a post-mortem if you'd like more information on the incident. It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you.
It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process.
You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY
If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c
Feel free to respond if your question has not been addressed.
sam-a
Apologies for this canned message but in an effort to help all our customers get back up and running, we are sending this bulk message. As you may know, we had a major interruption to our services yesterday. [We've published a post-mortem if you'd like more information on the incident](https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage). It describes what happened and what we are doing to prevent it in the future. We are deeply sorry for the impact that it has had on you. It is taking some time to bring everything back up, but we are working on it as fast as we can. In general, a redeployment should fix most service issues. Due to the volume of customers redeploying right now, builds and deploys may take longer than normal to process. You can track recovery status here: https://status.railway.com/incident/KVZ1Z8GY If you are still having other issues that might be related to the incident you can read more here: https://station.railway.com/community/road-to-recovery-post-gcp-outage-builds-d362e48c Feel free to respond if your question has not been addressed.
24 days ago
Hi Sam. I understand the team is dealing with a massive backlog, but my issue is NOT resolved by a redeploy.
As detailed in my original post, the problem is a corrupted file permission ON THE PERSISTENT VOLUME itself, which survives any container redeploy or restart:
FATAL: private key file "/var/lib/postgresql/data/certs/server.key" must be owned by the database user or root
Redeploying does not clear the volume. I need an infrastructure engineer to manually run a permission fix (chown) on my persistent disk so the database can finally boot up.
Could you please escalate this to someone who can access the volume?
Status changed to Awaiting Railway Response Railway • 24 days ago
24 days ago
Hey, sorry about the long wait and the canned responses earlier. Your data is intact on the volume, this is a file permission issue not data corruption.
Here's how to fix it: set your Postgres service's start command to sleep infinity (note your current start command first so you can restore it after). Redeploy, then connect via railway ssh and run:
chown postgres:postgres /var/lib/postgresql/data/certs/server.key
chmod 600 /var/lib/postgresql/data/certs/server.keyOnce that's done, remove the start command override, restore your original settings, and redeploy. Postgres should start normally with all your data intact.
This may have been related to the May 19 GCP outage. Sorry again for the disruption.
Status changed to Awaiting User Response Railway • 24 days ago
chandrika
Hey, sorry about the long wait and the canned responses earlier. Your data is intact on the volume, this is a file permission issue not data corruption. Here's how to fix it: set your Postgres service's start command to `sleep infinity` (note your current start command first so you can restore it after). Redeploy, then connect via `railway ssh` and run: ``` chown postgres:postgres /var/lib/postgresql/data/certs/server.key chmod 600 /var/lib/postgresql/data/certs/server.key ``` Once that's done, remove the start command override, restore your original settings, and redeploy. Postgres should start normally with all your data intact. This may have been related to the May 19 GCP outage. Sorry again for the disruption.
24 days ago
Thanks for confirming the data is intact! I tried following your instructions, but the workaround didn't work.
I overrode the Start Command with sleep infinity (and also tried bash -c "sleep 36000"), but the container still crash-loops instantly and never reaches a running state. >
Because the container refuses to stay alive, running railway ssh instantly fails with: > ServerMessage: "Your application is not running or in a unexpected state"
Since I am physically locked out of SSH and cannot run the chown/chmod commands myself, could an infrastructure engineer please run this permission fix directly on the volume for me?
Project: peaceful-success
Thanks for the help!
Status changed to Awaiting Railway Response Railway • 24 days ago