8 days ago
I am experiencing a persistent issue where my PostgreSQL database service in my project has become unreachable. The database connection spinner doesn't resolve in the Railway dashboard and these problems are breaking for my app.
This issue seems concurrent with the recent incident regarding "higher read/write latency on Railway Volumes," which was marked as resolved on June 26, 2025, at 8:45 AM UTC.
Key Observations and Troubleshooting Steps:
Unreachable Status: The database service shows as unreachable in the Railway UI, and my applications cannot connect to it.
Restart Attempted: I have attempted to restart the database service multiple times via the Railway dashboard, but the issue persists.
Log Gaps: Reviewing the PostgreSQL logs reveals significant gaps where no normal checkpoint operations are logged, indicating periods of unresponsiveness or downtime.
Example Gaps:
From 2025-06-25 05:00:58 UTC to 07:28:53 UTC
From 2025-06-25 07:29:23 UTC to 14:26:08 UTC
Later gaps: 2025-06-25 22:16:16 UTC to 23:46:17 UTC and 2025-06-25 23:51:17 UTC to 2025-06-26 00:56:18 UTC
Connection Errors in Logs: During these periods, the logs show repeated connection errors, indicating issues with processing incoming connections.
Example Log Entries:
2025-06-25 07:28:53.918 UTC [89134] LOG: invalid length of startup packet
2025-06-25 07:28:54.920 UTC [89135] LOG: invalid length of startup packet
... (multiple similar entries)
2025-06-25 07:29:23.552 UTC [89187] LOG: incomplete startup packet
all i see in logs is checkpoints
2025-06-26 05:41:23.276 UTC [29] LOG: checkpoint starting: time
2025-06-26 05:41:23.283 UTC [29] LOG: checkpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.007 s; sync files=4, longest=0.001 s, average=0.001 s; distance=8 kB, estimate=102 kB; lsn=9/94A30A8, redo lsn=9/94A3070
2025-06-26 05:46:23.283 UTC [29] LOG: checkpoint starting: time
2025-06-26 05:46:23.290 UTC [29] LOG: checkpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.008 s; sync files=7, longest=0.001 s, average=0.001 s; distance=24 kB, estimate=94 kB; lsn=9/94A9160, redo lsn=9/94A9128
Could you please investigate?
8 Replies
8 days ago
also, connecting from bash:
psql: error: connection to server at "monorail.proxy.rlwy.net" (35.212.181.170), port 22092 failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
csp@ubi:~$
8 days ago
can still ping it tho
nc -vz monorail.proxy.rlwy.net 22092
Connection to monorail.proxy.rlwy.net (35.212.181.170) 22092 port [tcp/*] succeeded!
8 days ago
Hi there,
Your Postgres service runs without issue. It has not yet been migrated to Railway Metal, but your application "fastapi-finddas-clusterer" service has, and the two different regions might cause some latency. I'd therefore recommend that you change the region of your Postgres service to "US West (California, USA)", and you'll see improved performance. If you don't migrate it, it'll be auto-migrated soon.
Furthermore, I'd recommend you enable Private Networking for communication between your services within your project. In addition to being more secure, it also means you won't be charged for network egress for communication between your services. Read about Private Networking here.
Regards,
Christian
Status changed to Awaiting User Response railway[bot] • 8 days ago
8 days ago
Thanks for the reply Christian! Why can't I reach it if it "runs without issue"?
Attachments
Status changed to Awaiting Railway Response railway[bot] • 8 days ago
8 days ago
Hey there,
I have gone ahead and gave your DB a proper kick that it needed and then moved it over to Railway Metal. You should be free of connection issues.
Status changed to Awaiting User Response railway[bot] • 8 days ago
8 days ago
Awesome, thank you. Can you tell why it was unresponsive? Was it the outage?
Status changed to Awaiting Railway Response railway[bot] • 8 days ago
8 days ago
Well, it's for an embarrassing reason. We shut off the GCP machine since we thought there were no workloads on it. I turned on the machine, and then moved it putting you in a good spot.
Status changed to Awaiting User Response railway[bot] • 8 days ago
Status changed to Solved syllabusadmin • 8 days ago