PostGRES database unreachable

syllabusadmin
PRO

8 days ago

I am experiencing a persistent issue where my PostgreSQL database service in my project has become unreachable. The database connection spinner doesn't resolve in the Railway dashboard and these problems are breaking for my app.

This issue seems concurrent with the recent incident regarding "higher read/write latency on Railway Volumes," which was marked as resolved on June 26, 2025, at 8:45 AM UTC.

Key Observations and Troubleshooting Steps:

Unreachable Status: The database service shows as unreachable in the Railway UI, and my applications cannot connect to it.

Restart Attempted: I have attempted to restart the database service multiple times via the Railway dashboard, but the issue persists.

Log Gaps: Reviewing the PostgreSQL logs reveals significant gaps where no normal checkpoint operations are logged, indicating periods of unresponsiveness or downtime.

Example Gaps:

From 2025-06-25 05:00:58 UTC to 07:28:53 UTC

From 2025-06-25 07:29:23 UTC to 14:26:08 UTC

Later gaps: 2025-06-25 22:16:16 UTC to 23:46:17 UTC and 2025-06-25 23:51:17 UTC to 2025-06-26 00:56:18 UTC

Connection Errors in Logs: During these periods, the logs show repeated connection errors, indicating issues with processing incoming connections.

Example Log Entries:

2025-06-25 07:28:53.918 UTC [89134] LOG: invalid length of startup packet

2025-06-25 07:28:54.920 UTC [89135] LOG: invalid length of startup packet

... (multiple similar entries)

2025-06-25 07:29:23.552 UTC [89187] LOG: incomplete startup packet

all i see in logs is checkpoints
2025-06-26 05:41:23.276 UTC [29] LOG: checkpoint starting: time

2025-06-26 05:41:23.283 UTC [29] LOG: checkpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.007 s; sync files=4, longest=0.001 s, average=0.001 s; distance=8 kB, estimate=102 kB; lsn=9/94A30A8, redo lsn=9/94A3070

2025-06-26 05:46:23.283 UTC [29] LOG: checkpoint starting: time

2025-06-26 05:46:23.290 UTC [29] LOG: checkpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.008 s; sync files=7, longest=0.001 s, average=0.001 s; distance=24 kB, estimate=94 kB; lsn=9/94A9160, redo lsn=9/94A9128

Could you please investigate?

Solved

8 Replies

syllabusadmin
PRO

8 days ago

also, connecting from bash:
psql: error: connection to server at "monorail.proxy.rlwy.net" (35.212.181.170), port 22092 failed: server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

csp@ubi:~$


syllabusadmin
PRO

8 days ago

can still ping it tho
nc -vz monorail.proxy.rlwy.net 22092

Connection to monorail.proxy.rlwy.net (35.212.181.170) 22092 port [tcp/*] succeeded!


Hi there,

Your Postgres service runs without issue. It has not yet been migrated to Railway Metal, but your application "fastapi-finddas-clusterer" service has, and the two different regions might cause some latency. I'd therefore recommend that you change the region of your Postgres service to "US West (California, USA)", and you'll see improved performance. If you don't migrate it, it'll be auto-migrated soon.

Furthermore, I'd recommend you enable Private Networking for communication between your services within your project. In addition to being more secure, it also means you won't be charged for network egress for communication between your services. Read about Private Networking here.

Regards,
Christian


Status changed to Awaiting User Response railway[bot] 8 days ago


syllabusadmin
PRO

8 days ago

Thanks for the reply Christian! Why can't I reach it if it "runs without issue"?

Attachments


Status changed to Awaiting Railway Response railway[bot] 8 days ago


syllabusadmin
PRO

8 days ago

Also, I tried to migrate over...no dice

Attachments


8 days ago

Hey there,

I have gone ahead and gave your DB a proper kick that it needed and then moved it over to Railway Metal. You should be free of connection issues.


Status changed to Awaiting User Response railway[bot] 8 days ago


syllabusadmin
PRO

8 days ago

Awesome, thank you. Can you tell why it was unresponsive? Was it the outage?


Status changed to Awaiting Railway Response railway[bot] 8 days ago


8 days ago

Well, it's for an embarrassing reason. We shut off the GCP machine since we thought there were no workloads on it. I turned on the machine, and then moved it putting you in a good spot.


Status changed to Awaiting User Response railway[bot] 8 days ago


Status changed to Solved syllabusadmin 8 days ago


PostGRES database unreachable - Railway Help Station