Railway proxy timeout
thecarnivalalldaybuffet
HOBBYOP

3 months ago

Subject: Intermittent Postgres disconnects via maglev.proxy.rlwy.net:47180 (ECONNRESET → connect timeouts) from external client

Service: Railway Managed Postgres
Host/Port:maglev.proxy.rlwy.net:47180
DB name:railway
Region: Us California
Client app: n8n hosted on Render ($7 Starter web service)
Client driver/runtime: Node 18, pg 8.12.0 via pg-pool 3.6.2

Summary

We see stable DB operation for ~10–20 minutes, then a burst of:

  1. Connection terminated unexpectedly

  2. read ECONNRESET

  3. repeated timeout exceeded when trying to connect (1–10+ minutes)

During those windows, new connections fail and existing ones drop. After we manually restart the DB, it works again until the next burst. Load is very light.

Client connection settings

  • SSL enabled (rejectUnauthorized=false)

  • Pool size: 3

  • Connect timeout: 70,000 ms

  • Idle connection timeout: 70,000 ms

  • App ping/keepalive: lightweight query every 5s

  • Workload: small n8n metadata/executions reads; no long-running queries

Timeline (UTC) from client logs (2025-09-09)

16:07:35 OK (periodic query)
16:08:35 OK
16:09:19 ERROR Connection terminated unexpectedly
16:09:27 ERROR read ECONNRESET (twice)
16:10:29 ERROR timeout exceeded when trying to connect
16:10:35 ERROR timeout exceeded when trying to connect
16:11:31/12:33/13:35/14:35/15:35 Repeated connect timeouts
16:14:48 ERROR Failed to hard-delete executions (root cause: timeout exceeded when trying to connect)

Railway Postgres logs after our manual restart (same day)

16:25:39 starting PostgreSQL 17.6
16:25:40 database system was interrupted; last known up at 16:18:35
16:25:40 database system was not properly shut down; automatic recovery in progress
16:25:40 redo starts at 0/32F5850
16:25:40 invalid record length at 0/32F5990: expected at least 24, got 0
16:25:40 redo done; checkpoint end-of-recovery
16:25:40 database system is ready to accept connections

Notably, client-side errors begin at 16:09:19, while the DB reports last known up at 16:18:35 and an unclean shutdown at restart time. That suggests proxy/LB issues or backend host/container problems that preceded the final DB interruption.

What we tried

  • Tuned pool size/timeouts and added 5s app-level pings.

  • Restarted DB; problem recurs later.

  • Traffic is minimal; other app calls (HTTP, Google Sheets OAuth refresh) succeed—only DB path fails.

Hypotheses

  • Maglev proxy/LB instability in our region causing RSTs, then “black-holed” connects.

  • DB container/host events (OOM/host drain/migration) leading to unclean shutdown.

  • Aggressive TCP/NAT idle timeouts on the proxy path despite app pings.

  • Less likely: per-client limits at very low connection counts.

Requests to Railway

  1. Check proxy/LB logs for maglev.proxy.rlwy.net:47180 between 16:08–16:16 UTC and surrounding minutes for RSTs, health flaps, backend detachments for our DB.

  2. Check DB container/host events around 16:09–16:19 UTC (OOM kills, node drain, maintenance).

  3. Confirm effective TCP/idle timeouts and recommended Postgres/pg client keepalive settings (tcp_keepalives_*), or other best-practice values for external clients.

  4. Advise on any known incidents in this region during that window.

  5. Is there a way to obtain a direct endpoint (bypassing shared proxy) or recommended PgBouncer approach for cross-provider clients?

Context

We may migrate this DB to Render Postgres (private network) to avoid cross-provider hops, but we’d like root cause on the Railway side to decide future usage and to help you triage a potential platform issue.

Happy to provide full client/server logs or run a short reproduction window if needed.

Solved

6 Replies

Railway
BOT

3 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


Railway

Hey there! We've found the following might help you get unblocked faster: - [🧵 Postgres slow/timeout](https://station.railway.com/questions/postgres-slow-timeout-dfffea16) - [🧵 Postgres ECONNRESET / unable to connect using TCP Proxy](https://station.railway.com/questions/postgres-econnreset-unable-to-connect-0aba867d) - [🧵 Postgres Connection Limit and Timeout](https://station.railway.com/questions/postgres-connection-limit-and-timeout-2173af9b) If you find the answer from one of these, please let us know by solving the thread!

thecarnivalalldaybuffet
HOBBYOP

3 months ago

No these did not help


Railway
BOT

3 months ago

Hello!

We're acknowledging your issue and attaching a ticket to this thread.

We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.

Please reply to this thread if you have any questions!


jake
EMPLOYEE

3 months ago

The edgeproxies will at some point restart. Could you please attempt to use the private networking instead?

I've attached a ticket to our 'Support hotreloading connections across machines' ticket


Status changed to Awaiting User Response Railway 3 months ago


thecarnivalalldaybuffet
HOBBYOP

3 months ago

so you are saying i should try private networking with render and railway is that correct


Status changed to Awaiting Railway Response Railway 3 months ago


david
EMPLOYEE

3 months ago

Private networking between Railway services. Per docs here: https://docs.railway.com/guides/private-networking


Status changed to Awaiting User Response Railway 3 months ago


Railway
BOT

2 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 2 months ago


Loading...