13 days ago
Subject: Private networking broken between services after Postgres restart
Our PostgreSQL service went down and after bringing it back up, the private network connection between our app service and Postgres service is broken. Our production app is currently down and we are unable to deploy a fix because the build queue is backed up.
Details:
Project ID:
9293b7ee-f9f4-4a66-a22e-ca6e96a32fa4Environment ID:
61f5a445-b1d4-4ede-acdf-43d62490813dApp service ID:
a2aeffdd-13a6-496c-8102-6ef0ae870f8dThe app cannot reach Postgres at its private network address using the template variable
The database is healthy and reachable via the public url
We have restarted both the app and Postgres services — private networking still does not recover
We need to switch
DATABASE_URLto the public proxy as a workaround, but the deploy is stuck in the build queue
Could you either restore private networking between these services or prioritise our deploy in the build queue so we can get back online?
5 Replies
13 days ago
Once this issue is resolved I will need to understand exactly why this happened, why did the database not attempt to re-connect the first time and why did it cause private networking to fail between these services? Was it something we've mis-configured?
I'm a little concerned for using this platform and have to have a difficult conversation with stakeholders after this incident, but hope to continue to use it as it makes my life so much easier
13 days ago
Update: build queue has cleared and back online with the temp workaround using the public database url
Any help getting to bottom of this will be appreciated
13 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 13 days ago
13 days ago
Update: I'm looking at implementing this on our produciton environment https://github.com/railwayapp-templates/postgres-ha
Anyone else have any experience?
13 days ago
Thanks so much for your help, really appreciate the thorough explanation. I've gone ahead and implemented a more robust solution: an adaptive connection pool that automatically falls back to the public database URL if the private network is unreachable, and periodically re-probes the private URL to switch back when it recovers. This ensures the app self-heals without needing a redeploy.
However, even after the database has fully restarted and the app service has been restarted, the private network URL still isn't working — the app is consistently falling back to the public URL. The fallback is great to have in place for when this happens, but we don't want to be running through the public proxy long term for obvious reasons (cost and latency).
Does it sound like this specific case needs someone at Railway to take a look? It seems like the private network route between these two services just isn't recovering on its own, even with fresh restarts on both sides.
9 days ago
One possibility is stale DNS resolution for the private service hostname after the Postgres container restarted.
On Railway, private networking between services relies on internal DNS. When a service crashes or is redeployed, the underlying container IP can change. If the application runtime or connection pool cached the previous IP, it will keep attempting to connect to the old address even though the database is healthy again.
This would explain why:
• the database works via the public proxy
• restarting services did not immediately restore connectivity
• switching DATABASE_URL to the public endpoint worked
Some runtimes (Node, Java, Go connection pools, etc.) cache DNS results longer than expected, especially inside long-lived connection pools.
You can confirm this by resolving the hostname from inside the app container:
getent hosts <postgres-private-host>
If the IP differs from what the app originally connected to, the issue is likely stale DNS.
Using a pool that periodically refreshes DNS or setting a lower DNS TTL usually prevents this after container restarts.
