Project ID: 25d9b7d2-8e29-4d0d-b5bd-54a8709e7891 Region: US East (Virginia) Stuck Deployment ID: 074d1c7d-62b2-437e-a30f-41f831c6f862 My production Postgres became unreachable after activity showed “reset region due to volume migration failure.” The Postgres volume (coil-volume) is still attached (\~2GB used of 50GB). Logs show normal checkpoints and autovacuum activity, no initdb or cluster recreation. So the data directory appears intact. However: * Internal and public connections to the service time out. * The Backups tab fails with `RouterLegacyService/CreateVolumeInstanceSnapshot UNAVAILABLE`. * A redeploy is currently stuck at “Creating containers…” * Duplicating the service works normally (new volume). * Staging environment works normally. * Only this specific production Postgres service is affected. This looks like a control-plane / volume rebind or routing issue following the failed region migration. Requesting confirmation that the existing volume is intact and assistance rebinding it to a healthy deployment or restoring networking. This is a production outage.

[RESOLVED] Production Postgres unreachable after region reset (volume migration failure)

ak2k2

PROOP

4 months ago

Project ID: 25d9b7d2-8e29-4d0d-b5bd-54a8709e7891

Region: US East (Virginia)

Stuck Deployment ID: 074d1c7d-62b2-437e-a30f-41f831c6f862

My production Postgres became unreachable after activity showed “reset region due to volume migration failure.”

The Postgres volume (coil-volume) is still attached (~2GB used of 50GB). Logs show normal checkpoints and autovacuum activity, no initdb or cluster recreation. So the data directory appears intact.

However:

Internal and public connections to the service time out.
The Backups tab fails with RouterLegacyService/CreateVolumeInstanceSnapshot UNAVAILABLE.
A redeploy is currently stuck at “Creating containers…”
Duplicating the service works normally (new volume).
Staging environment works normally.
Only this specific production Postgres service is affected.

This looks like a control-plane / volume rebind or routing issue following the failed region migration.

Requesting confirmation that the existing volume is intact and assistance rebinding it to a healthy deployment or restoring networking.

This is a production outage.

Solved

5 Replies

92gc

PRO

4 months ago

Experiencing the same issue. Postgres became unreachable on both internal and public endpoints (Prisma P1001 errors).

Redeploy stuck on "Creating containers" for 15+ minutes. Snapshot/backup service also returning connection refused.

92gc

PRO

4 months ago

working for me now

ak2k2

PROOP

4 months ago

did you cancel the redeploy or leave it running?

ak2k2

PROOP

4 months ago

omg its back up for me as well. LFG. biggest infra related emotional rollercoaster i have ever experienced. im going to PGdump these backups immediately lol

ray-chen

EMPLOYEE

4 months ago

Sorry about this, there was an internal transient error that has since been resolved. We're still investigating how it happened and taking steps to prevent it. Going to resolve this thread but feel free to re-open if this surfaces again (it shouldn't)!

Status changed to Awaiting User Response Railway • 4 months ago

Status changed to Solved ray-chen • 4 months ago

Welcome!