Database crash unable to re-start or re-deploy

pavletek

PROOP

a month ago

2026-05-20 17:30:36.015 UTC [2] FATAL: private key file "/var/lib/postgresql/data/certs/server.key" must be owned by the database user or root

2026-05-20 17:30:36.015 UTC [2] LOG: database system is shut down

This happened after the outage yesterday, and I do not know how to fix it. it retries this private key thing multiple times, and fails.

I've tried using the agent to fix the issue, and it is recommending me to wipe out the whole volume with my data??

Please help, I do not want to loose the data in here

Solved$20 Bounty

7 Replies

Status changed to Awaiting Railway Response Railway • about 1 month ago

mykal

EMPLOYEE

a month ago

We don't provide managed PostgreSQL, so the database configuration, including certificate file ownership, is on the application side. You should be able to fix this by temporarily setting your service's start command to sleep infinity (make sure to note your current start command first so you can restore it after). This will keep the container alive so you can connect via railway ssh and run chown postgres:postgres /var/lib/postgresql/data/certs/server.key to correct the file ownership. Once that's done, restore your original start command and redeploy. Your data is intact on the volume, no need to wipe it. We're going to connect you with the community for further help with this.

Status changed to Awaiting User Response Railway • about 1 month ago

Status changed to Open mykal • about 1 month ago

darseen

HOBBYTop 1% Contributor

a month ago

Alternatively, instead of SSHing into your service, you can set the start command of your postgres service to: bash -c "chown postgres:postgres /var/lib/postgresql/data/certs/server.key && chmod 600 /var/lib/postgresql/data/certs/server.key", and redeploy. once the container starts and runs the start command, you can remove it and redeploy again.

85ed

HOBBY

a month ago

This error specifically points to a filesystem ownership/permissions problem on the SSL private key rather than direct database corruption.

PostgreSQL intentionally refuses to start if:

the private key owner is incorrect
or the permissions are considered insecure

So the current logs suggest the startup is aborting during PostgreSQL security checks, not because the database files themselves are necessarily damaged.

At this stage, there’s no indication that wiping the volume is required.

The Railway team recommendation to temporarily regain shell access and fix ownership/permissions on:

/var/lib/postgresql/data/certs/server.key

is technically the correct next recovery step.

zporporz

HOBBY

a month ago

Please do not wipe the volume yet. This error does not indicate that the Postgres data itself is corrupted. It means Postgres is refusing to start because the SSL private key file has the wrong owner/permissions:

/var/lib/postgresql/data/certs/server.key

Postgres requires server.key to be owned by the database user, or by root, with restrictive permissions.

The fix should be to repair ownership/permissions on the existing volume, not delete it.

I would try this recovery path:

First, create a manual backup/snapshot of the Railway volume if the Backups tab is available.
Do not wipe or recreate the volume.
If you can run the container as root, set this env var temporarily:

RAILWAY_RUN_UID=0

Then redeploy and see if the Postgres image/entrypoint repairs the ownership.

If it still does not start, the volume needs a one-time permission repair on the mounted filesystem:

chown postgres:postgres /var/lib/postgresql/data/certs/server.key
chmod 600 /var/lib/postgresql/data/certs/server.key

or, if Railway’s image expects root-owned certs:

chown root:root /var/lib/postgresql/data/certs/server.key
chmod 640 /var/lib/postgresql/data/certs/server.key

After that, restart the Postgres service.

If Railway does not provide a way for me to shell into the failed database container, I need Railway support/staff to run the permission repair on the mounted volume or provide a safe recovery shell. The important part is: the data volume should be preserved. This should be recoverable without deleting the database data.

mykal

We don't provide managed PostgreSQL, so the database configuration, including certificate file ownership, is on the application side. You should be able to fix this by temporarily setting your service's start command to `sleep infinity` (make sure to note your current start command first so you can restore it after). This will keep the container alive so you can connect via [railway ssh](https://docs.railway.com/cli/ssh) and run `chown postgres:postgres /var/lib/postgresql/data/certs/server.key` to correct the file ownership. Once that's done, restore your original start command and redeploy. Your data is intact on the volume, no need to wipe it. We're going to connect you with the community for further help with this.

pavletek

PROOP

a month ago

THANNKS!!!!!!!!!

pavletek

PROOP

a month ago

Thanks to all the replies helping me with my issue, I connected through ssh, and runned the command and managed to re-start the db.

I still have a question, was this related to the outage and problems with google cloud? was this my fault? how could I prevent this from happening again?

Again, thanks!

chandrika

EMPLOYEE

a month ago

Glad you got it fixed! To answer your questions: this may have been related to the May 19 GCP outage, which caused abrupt shutdowns that could corrupt file permissions on volumes. It wasn't anything you did.

To prevent it in the future, enabling volume backups would give you a restore point, and Point-in-Time Recovery would give you continuous protection.

Status changed to Awaiting User Response Railway • about 1 month ago

Railway

BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway • 27 days ago

Welcome!