Postgres service in crash loop, restore appears to not be progressing

mdmxmyr

PROOP

2 months ago

I asked claude code to help me describe whats going on. For some reason, after trying to increase the max connections for my database (in an attempt to not make things feel "stuck all the time") my database is no longer booting? it seems to be stuck after I tried restarting ti to make this setting apply.

What I did

1. Connected to the service via the public proxy psql endpoint and ran:

sql
ALTER SYSTEM SET max_connections = 200;
SELECT pg_reload_conf();

I expected this to take effect on next restart. The reload itself produced this informational log:

parameter "max_connections" cannot be changed without restarting the server configuration file "/var/lib/postgresql/data/pgdata/postgresql.auto.conf" contains errors; unaffected changes were applied

2. Restarted the Postgres service from the Railway dashboard.

3. Service entered a crash-restart loop. The only logs visible since are repeated Mounting volume on: /var/lib/containers/railwayapp/bind-mounts// entries every ~30-60 seconds. Zero Postgres process logs between mount events — no starting PostgreSQL, no redo, no database system is ready. Looks like Postgres is dying during early startup before any logs flush, presumably because max_connections = 200 requires more shared memory / IPC slots than the container is provisioned for.

What I tried:

Attempt 1: break the loop with a Custom Start Command override Set Custom Start Command to /bin/sh -c "exec sleep infinity" (per the Custom Start Command docs, which note it overrides ENTRYPOINT in exec form). Goal: keep the container alive without launching Postgres so I could railway ssh and remove the bad line from postgresql.auto.conf.

Result: no effect. Container kept crash-looping with the same "Mounting volume" pattern. Likely the Postgres template image has additional init logic (healthcheck killing the container when port 5432 doesn't respond, or a wrapper that runs postgres regardless of CMD/ENTRYPOINT override).

Attempt 2: railway ssh Returned Your application is not running or in a unexpected state. (Container never stable long enough to attach.)

Attempt 3: restore from most recent daily backup Cleared the Custom Start Command, then clicked Restore on the most recent Daily Schedule backup (~3 hours old, 1.65 GB). Service is still crash-looping with the same "Mounting volume" pattern. Either the restore reused the same volume (so the bad postgresql.auto.conf is still on disk), or the restore hasn't progressed because Postgres needs to start to accept the dump.

Logs (latest sample, all that's been emitted for ~40 min)

2026-05-04T12:14:54Z [inf] Mounting volume on:

/var/lib/containers/railwayapp/bind-mounts/5a97e1c4-6b81-4411-b508-e864074d19bc/vol_he5u9dr10s8pyv7c

2026-05-04T12:15:27Z [inf] Mounting volume on:

/var/lib/containers/railwayapp/bind-mounts/5a97e1c4-6b81-4411-b508-e864074d19bc/vol_he5u9dr10s8pyv7c

2026-05-04T12:16:00Z [inf] Mounting volume on:

/var/lib/containers/railwayapp/bind-mounts/5a97e1c4-6b81-4411-b508-e864074d19bc/vol_he5u9dr10s8pyv7c

... (repeats every 30-60s indefinitely)

Last legitimate Postgres log before the loop began (during pg_reload_conf()):

2026-05-04 11:52:39 UTC [5] LOG: parameter "max_connections" cannot be changed without restarting the server

2026-05-04 11:52:39 UTC [5] LOG: configuration file "/var/lib/postgresql/data/pgdata/postgresql.auto.conf" contains errors; unaffected

changes were applied

$20 Bounty

7 Replies

Railway

BOT

2 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway • 2 months ago

medim

MODERATOR

2 months ago

Hey, looks like your config file is badly formatted/corrupted, let's try to SSH into your service and remove that max_connections config. Let's get your db back online before trying to increase max connections again.

Set your start command as sleep infinity, see if it redeploys correctly and SSH into it, can you share what your postgresql.auto.conf looks like? You can see it by running this command cat /var/lib/postgresql/data/pgdata/postgresql.auto.conf

I also suggest that instead of raising max connections directly, you use a connection pooler like PgBouncer (there's templates of it on the marketplace).

mdmxmyr

PROOP

2 months ago

Hi, I tried the sleep infinity start command, but that did not help: I was not able to SSH into it, I got the following error

Expected welcome message, received: ServerMessage { type: "error", payload: ServerPayload { data: Empty, message: "Your application is not running or in a unexpected state", code: None } }

medim

MODERATOR

2 months ago

Your service never became active, that's why it's showing that error.

How's the logs looking? same thing?

mdmxmyr

PROOP

2 months ago

It basically kept logging the following over and over every couple of minutes

2026-05-04T12:14:54Z [inf] Mounting volume on:

/var/lib/containers/railwayapp/bind-mounts/5a97e1c4-6b81-4411-b508-e864074d19bc/vol_he5u9dr10s8pyv7c

Note that deploying would also never finish, it would get stuck in this loop.

medim

MODERATOR

2 months ago

Where's that db deployed to? what region? There's currently a degraded volume performance incident going on in EU West (https://status.railway.com/incident/QK1OHYXB)

mdmxmyr

PROOP

2 months ago

It deployed in the EU West region, but would that cause this issue?

mdmxmyr

It deployed in the EU West region, but would that cause this issue?

medim

MODERATOR

2 months ago

There was a volume degradation on EU West, and it seems like your service was failing to mount the volume.

If it is still failing, my last suggestion is to back up that volume, change the source image to Ubuntu, and then try the steps I suggested earlier, but this is very tricky and really a last-resort option.

Remember that backups are linked to volumes so don't delete that volume if you ever wanna restore the backup.

Welcome!