Production Postgres password rotation produced unrecoverable pg_authid mismatch - production app degraded
grantasplund
PROOP

a month ago

Hi Railway team,

I need urgent help recovering from a Postgres password rotation gone wrong.

Project ID: 24c2a1d1-ce0c-4ca7-ab77-1a324eeeeae9

Project: VeriPass Application

Environment: production

Postgres service deployment: a71265fb (Active since 7:04 AM PDT today)

Timeline:

- 7:05 AM PDT: Used the Railway agent's "Rotate password" flow to rotate POSTGRES_PASSWORD on the Postgres service

- The rotation reported success and the Postgres container restarted

- However, the new POSTGRES_PASSWORD value does not actually authenticate against pg_authid (the data directory was not re-initialized on container restart, which is expected Docker Postgres behavior)

- Subsequent attempts to manually align pg_authid with the variable store via ALTER USER have failed

- The Connect dialog in the dashboard currently displays a public connection string whose password also fails authentication

- Five distinct passwords have been tested, none authenticate

Current state:

- Production VeriPass app (b7b8f7d4-cea2-41e2-8055-09260ca3a069) is serving HTTP from a 15-hour-old connection pool that authenticated before any of this happened

- /health returns "degraded" with database: error - the pool is failing on new connections

- The data is intact - I can see all ~2,669 rows in the certifications table via the dashboard's Database tab

- We have no backups (not on Pro plan)

- Critical: this database serves a customer-facing demo at clcinpdx.github.io and we cannot tolerate data loss

What I need:

- An authoritative reset of the postgres user's password in pg_authid via your container-level access

- The new password value placed into POSTGRES_PASSWORD on the Postgres service

- Confirmation that DATABASE_URL on the VeriPass service will resolve to the new value via the ${{Postgres.POSTGRES_PASSWORD}} reference I configured

I am available for a screen share or rapid back-and-forth. The longer the existing app pool runs, the higher the risk it dies and we have full outage. Please escalate as production-impacting.

Thanks,

Grant Asplund

Solved

4 Replies

Status changed to Awaiting Railway Response Railway about 1 month ago


grantasplund
PROOP

a month ago

I just upgraded to Pro. Please re-route this ticket to the Pro support queue and prioritize as production-impacting.


grantasplund
PROOP

a month ago

Update: I'm now on the Pro plan (just upgraded). I've also confirmed via the dashboard's Database tab that all production data is intact - the certifications table shows 268 pages of rows. The issue is purely auth: pg_authid contains a value that doesn't match what POSTGRES_PASSWORD resolves to via reference variables, and I cannot authenticate from outside the container with any password from any Railway-exposed surface (POSTGRES_PASSWORD, PGPASSWORD, the public Connect dialog string). I've verified the dashboard does not expose container shell access on this service, so I can't run an ALTER USER from inside the container myself. I need a Railway engineer to reset the postgres user password authoritatively from container/host access. App is running degraded on a 15-hour-old connection pool that will eventually die. Please prioritize.


grantasplund
PROOP

a month ago

Found Regenerate Password button under Database > Config. About to click it. Will report back.


Hey Grant, did the "Regenerate Password" button in Database > Config resolve the issue? If your app is still showing degraded health, let us know and we can reset the password from our side.


Status changed to Awaiting User Response Railway about 1 month ago


Railway
BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway about 1 month ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...