Web service crashing with "password authentication failed" after rotating POSTGRES_PASSWORD
hxseiko-erp
HOBBYOP

16 days ago

Hi Railway team,

My web service has been crashing for ~16 hours with "password authentication failed for user postgres", and I cannot recover it via redeploys.

Project: nurturing-solace

Web service: web (web-production-e96e4.up.railway.app)

Postgres service: Postgres (separate service)

Plan: Hobby

What happened:

1. Yesterday around 13:30, I rotated POSTGRES_PASSWORD via Railway dashboard. Postgres redeployed successfully and web service was working.

2. Around 17:00, I pushed a new commit to web service. Deploy started failing with "password authentication failed for user postgres".

3. I rotated POSTGRES_PASSWORD a second time around 17:20.

4. Postgres service redeployed successfully again (current ACTIVE deployment is from ~16h ago).

5. However, web service still cannot connect — it gets "password authentication failed" at the alembic upgrade step on startup.

Configuration:

- Web service DATABASE_URL is set to ${{Postgres.DATABASE_URL}} (reference, not hardcoded)

- Procfile: web: cd backend && alembic upgrade head && uvicorn main:app --host 0.0.0.0 --port $PORT

- Postgres service DATABASE_URL resolves correctly when viewed in Railway dashboard

- Pre-deploy Command was previously set to "npm run migrate" (a leftover, since cleared)

What I've tried:

- Multiple redeploys of web service: all crash with same error

- Restarted Postgres service via Redeploy: ACTIVE, no errors

- Pushed a fresh commit (10ef53b) to bypass build cache: still crashed with same error

- Verified DATABASE_URL is reference, not hardcoded

- Verified .env files are not committed to git

What I suspect:

There's some kind of stale password being used by web service. Either the reference is not resolving to the current Postgres password, or there's a cached value somewhere in the web container that I don't have visibility into.

Failed deployment IDs (most recent):

- 10ef53b (most recent)

- 388288d

- 2cca4f7e

Could you check what password the web container is actually receiving when it tries to connect? Or guide me on how to fully reset the connection between web and Postgres?

Thanks for your help — this is blocking my production ERP for my factory.

Solved$10 Bounty

Pinned Solution

Try this:

1. Disable all public networking on the database if you have any, as the following steps will disable user authentication

2. SSH into your database service (right click your service and select Copy SSH Command)

3. Run this command: sed -i 's/host all all all scram-sha-256/host all all ::\/0 trust/' /var/lib/postgresql/data/pgdata/pg_hba.conf (This will bypass user authentication)

4. Redeploy your database

5. SSH again, and run the command psql

6. Run ALTER USER postgres with password '<PASSWORD>'; where <PASSWORD> is the value of the variable PGPASSWORD in your Railway dashboard

7. Type exit

8. Run sed -i 's/host all all ::\/0 trust/host all all all scram-sha-256/' /var/lib/postgresql/data/pgdata/pg_hba.conf (This will re-enable user authentication)

9. Redeploy your database

8 Replies

Status changed to Open Railway 16 days ago


Can you check if the passwords are the same across the variables PGPASSWORD and POSTGRES_PASSWORD and DATABASE_URL?


Also, just a sanity check, I assume there's nothing in the deployment logs of Postgres displaying any authentication errors?


hxseiko-erp
HOBBYOP

16 days ago

Update with new diagnostic findings:

I added a debug step to Procfile to test direct psycopg2 connection

before alembic runs:

Procfile: web: cd backend && python3 -c "import os, psycopg2;

psycopg2.connect(os.environ['DATABASE_URL']);

print('PSY_CONNECT_OK')" && alembic upgrade head && uvicorn ...

Result on production: psycopg2.connect() fails immediately with

"password authentication failed for user postgres" — PSY_CONNECT_OK

is never printed.

But: when I run railway run env | grep DATABASE_URL from my local

machine (linked to web service), the password matches the current

POSTGRES_PASSWORD I set 30 minutes ago.

This suggests production web container is receiving a different

DATABASE_URL value than what railway run resolves locally — i.e.,

the reference is resolving correctly for railway run but not for

the actual deployed container.

Failed deployment: 1b57119 (most recent)


0x5b62656e5d

Also, just a sanity check, I assume there's nothing in the deployment logs of Postgres displaying any authentication errors?

^


0x5b62656e5d

^

hxseiko-erp
HOBBYOP

16 days ago

Hi, thanks for jumping in. Here are the answers to your two questions:

Q1: Are PGPASSWORD / POSTGRES_PASSWORD / DATABASE_URL passwords consistent on the Postgres service?

Yes — I compared the first and last few characters of all three values in the Postgres service Variables tab and they match. The password is hex-only (no special characters that would need URL encoding).

Q2: Are there authentication errors in the Postgres deployment log?

Yes, lots of them. Sample pattern from the current Postgres deployment log (UTC):

```

2026-05-07 01:05:44.954 UTC [39] FATAL: password authentication failed for user "postgres"

2026-05-07 01:05:44.961 UTC [40] FATAL: password authentication failed for user "postgres"

... (20 entries in ~30 seconds)

[8 minute gap — web service crash/restart cycle]

2026-05-07 01:14:02.940 UTC [61] FATAL: password authentication failed for user "postgres"

... (another burst)

```

The pattern is bursts of ~20 failed auth attempts every few seconds, followed by an 8-minute gap (matching the web service's crash → restart loop), then another burst. This has been continuous since the original incident started ~17 hours ago.

What this tells us combined with my earlier evidence:

- Postgres-side config is correct (three vars match, password is valid hex)

- The web service container IS reaching Postgres at the network level

- But it's presenting a wrong password

- railway run env | grep DATABASE_URL from my Mac returns the correct current password

- Production web container psycopg2 connection fails immediately (verified via a Procfile probe before alembic runs)

So railway run and the production container are getting different resolved values for ${{Postgres.DATABASE_URL}}, even though the web service Variables tab shows the reference syntax correctly.

What I've already tried (no change):

- Multiple redeploys of web service

- Restarting Postgres service

- Pushing fresh commits to bypass build cache (10ef53b, 1b57119)

- Rotating the password twice (base64 → hex format)

- Clearing the Pre-deploy Command

The reference variable in the web service does not appear to be re-resolving against the current Postgres value. Could you take a look at how the reference is being resolved for the web service deployment? Happy to provide the project ID, service IDs, or any other diagnostic info you need.

Thanks!


Try this:

1. Disable all public networking on the database if you have any, as the following steps will disable user authentication

2. SSH into your database service (right click your service and select Copy SSH Command)

3. Run this command: sed -i 's/host all all all scram-sha-256/host all all ::\/0 trust/' /var/lib/postgresql/data/pgdata/pg_hba.conf (This will bypass user authentication)

4. Redeploy your database

5. SSH again, and run the command psql

6. Run ALTER USER postgres with password '<PASSWORD>'; where <PASSWORD> is the value of the variable PGPASSWORD in your Railway dashboard

7. Type exit

8. Run sed -i 's/host all all ::\/0 trust/host all all all scram-sha-256/' /var/lib/postgresql/data/pgdata/pg_hba.conf (This will re-enable user authentication)

9. Redeploy your database


0x5b62656e5d

Try this: 1\. Disable all public networking on the database if you have any, as the following steps will disable user authentication 2\. SSH into your database service (right click your service and select `Copy SSH Command`) 3\. Run this command: `sed -i 's/host all all all scram-sha-256/host all all ::\/0 trust/' /var/lib/postgresql/data/pgdata/pg_hba.conf` (This will bypass user authentication) 4\. Redeploy your database 5\. SSH again, and run the command `psql` 6\. Run `ALTER USER postgres with password '<PASSWORD>';` where `<PASSWORD>` is the value of the variable `PGPASSWORD` in your Railway dashboard 7\. Type `exit` 8\. Run `sed -i 's/host all all ::\/0 trust/host all all all scram-sha-256/' /var/lib/postgresql/data/pgdata/pg_hba.conf` (This will re-enable user authentication) 9\. Redeploy your database

hxseiko-erp
HOBBYOP

16 days ago

Update: Followed your steps and the fix worked. Web service is now ACTIVE

and connecting to Postgres successfully.

Diagnosis confirmed: the postgres user's stored password inside the database

was out of sync with the PGPASSWORD variable, so the ALTER USER step is what

actually resolved it. The earlier password rotations through the dashboard

must have updated the variables but not propagated into the database itself.

Thanks for the precise instructions — saved a lot of debugging.


Status changed to Solved 0x5b62656e5d 16 days ago


hxseiko-erp
HOBBYOP

14 days ago

Update — fix successful, thanks again.

Followed your SOP and it worked perfectly. Web service has been ACTIVE

since recovery (~2 days now), no auth errors in Postgres logs.

Confirmed root cause exactly as you suspected: the postgres user's stored

password inside the database was out of sync with PGPASSWORD. The ALTER USER

step was the real fix — the earlier dashboard rotations updated the variable

but never propagated into the database itself.

For anyone finding this thread later: the order matters. The correct way to

rotate a password on Railway Postgres is **ALTER USER first, then update

POSTGRES_PASSWORD in dashboard** — doing it the other way around (or only

the dashboard side) causes exactly this outage pattern.

Thanks for the precise instructions and for catching the public networking

risk before we touched pg_hba.conf — saved us from a much worse incident.


Status changed to Awaiting Railway Response Railway 14 days ago


Status changed to Solved Railway 14 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...