P1001 Can't reach Postgres in same project since 01:52 BRT — coinciding with today's incident
edurcampos86-jpg
PROOP

a month ago

TL;DR: cockpit-onix cannot reach Postgres in same project. Tested 4 different DATABASE_URL formats (literal host, RAILWAY_PRIVATE_DOMAIN reference, public TCP proxy, 100% Postgres references) — all failed with same P1001. A rollback to an older commit predating this week also crashed with the same P1001, ruling out application regression. Started today at 01:52 BRT after Railway-side Postgres restart, coincides with your 'Builds are slow to progress' incident.

IDs:

  • Project: b5174f97-cc1b-482c-921d-cb7749456cb8
    • Environment: 9f2a9371-ec66-4fd0-8b9d-0d5bfc8c4596
    • Service cockpit-onix: 0fdad823-5b78-45ff-9301-0d19ffb9d9f7
    • Service Postgres: 8515a07d-cc09-4ab1-b3c0-9cb0ab889a15
    • Region: EU West
    • 502 Request IDs: 7tg0TrwrS_6JNuUyjVra_w, IPn0NZo_TSmwXiiGBT7zVQ, BD2YdbrmT6quZRDNPvyhXg

Error on every restart attempt:

Error: P1001: Can't reach database server at postgres.railway.internal:5432

P1001 = TCP-layer failure before authentication. A credential issue would surface as 'password authentication failed' — getting P1001 means packets aren't reaching Postgres at all.

Postgres side: card shows ACTIVE/Online, logs show normal checkpoints; only restart event today was 01:52 BRT volume remount (not requested by me).

Asking:

  1. Confirm private networking (*.railway.internal) for this project is healthy. If not, escalate as part of today's incident.
  2. Confirm Postgres service is accepting TCP connections internally on port 5432.
  3. If you find a Railway-side issue, restart the network namespace / re-bind the private domain for this project. Services have been down 16+ hours.

No workaround in place. Don't want to roll back to a Postgres snapshot from before 01:52 BRT — would lose yesterday's work. Happy to provide more screenshots, additional Request IDs, or temporary access.

Solved

1 Replies

Railway
BOT

a month ago

We confirmed your app service is crashing with P1001 on every attempt to reach the database over the private network, consistently from 17:52 UTC through at least 19:31 UTC today. Your Postgres service shows SUCCESS status, but its last deployment dates back to April 1, which means the private networking bindings have gone stale. This is a known pattern where a platform-side event (like the volume remount you observed) disrupts the internal DNS registration for a long-running database deployment, even though the database process itself stays up. To fix this, open your Postgres service, press Cmd+K (or Ctrl+K) to open the command palette, and select "Redeploy source image." This re-pulls the image and re-establishes the private networking binding. Your volume data will not be affected. Once Postgres is back on the private network, redeploy your app service and it should connect normally.


Status changed to Awaiting User Response Railway about 1 month ago


Railway
BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 30 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...