Deploy nodes cannot reach internal registry (us-west2) — service offline
bmetcalf21
FREEOP

2 months ago

Issue: Service is offline. Builds succeed but every deploy fails at container pull/unpack

with a registry timeout. 9 consecutive failures including rollback.

Project ID: 97b5665c-37a4-4b87-9939-ec46f408921e

Service ID: 805b27f5-146d-4c82-b92b-7784b0d1c607

Environment: production (59987721-1e2d-4028-b4fc-4c9891303051)

Error (same on all 9):

Container failed to start

/orchestrator.RouterLegacyService/CreateDeployment DEADLINE_EXCEEDED:

ctrd: failed to pull/unpack image: failed to resolve reference

"production-us-west2.railway-registry.com/...":

dial tcp 162.220.232.122:443: i/o timeout

Key findings:

- Last successful deploy: 4e766d4c at 06:52 UTC today. Failures started at 20:18 UTC — no

code or config changes between the two.

- The original outage was the running container becoming unreachable via the public URL

before any redeploy. Internal /health returned 200 via SSH.

- production-us-west2.railway-registry.com responds to external requests. Deploy nodes can't

reach it — points to internal network partition.

- Tried changing deploy region to us-west1 via API — builds still push to

production-us-west2 registry, same failure.

- Tried us-east4 — got configErrors: "User does not have access to region us-east4"

- Status page shows no incident for March 29.

This is a production outage. Appreciate any help.

$10 Bounty

1 Replies

Status changed to Awaiting Railway Response Railway about 2 months ago


bmetcalf21
FREEOP

2 months ago

This appears to be the same issue as [link to that thread]

same DEADLINE_EXCEEDED / dial tcp i/o timeout on registry pull, different registry hostname. That one was resolved server-side by @brody.


Status changed to Open Railway about 2 months ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...