Outbound network (egress) completely broken for one service — :443 TCP timeouts to all external hosts

Anonymous

HOBBYOP

3 months ago

Hi,

Since 2026-04-16 ~10:00 UTC our render service on project railway.type3.audio has been unable to make any outbound HTTPS request, causing ~100% of customer jobs to fail.

Project ID:d1b29819-2b0d-48bd-a758-50cf03a97b82
Environment:production (846fc2fb-9289-40eb-b5ae-014f046c9b23)
Service ID:698d9ad9-afc9-472e-a62b-d405521fec9c
Region: us-west2
Latest deploy:bf717273-57e1-4f74-b4d2-c1cbe04a51ed (fresh redeploy)

Symptom

Every outbound TCP :443 connection from the container hangs until timeout, regardless of destination. Inbound traffic is fine — GET /health returns 200 in <5 ms throughout.

Confirmed failing destinations (all unrelated IPs / providers):

example.com via Playwright → page.goto: Timeout 15000ms exceeded
api.scrape.do (163.172.169.229) → connect ETIMEDOUT
in.logtail.com (91.98.x.x, 91.99.x.x) → connect ETIMEDOUT Representative log line:

 [ERRO] Native crawler and scrapedo fallback both failed for
 https://example.com. nativeError=page.goto: Timeout 15000ms exceeded …
 fallbackReason=connect ETIMEDOUT 163.172.169.229:443

What we've tried

deploymentRestart via the GraphQL API — no change.
Full redeploy (railway redeploy) — new deployment bf717273…timed out in the build step at 20 min while fetching Ubuntu packages from archive.ubuntu.com. The container never got built, so the service kept serving the old (last-month) image.
Another railway redeploy (bb49948f…) — same build-timeout behaviour.
Reaching scrape.do and logtail from outside Railway — both work fine, so the destinations are healthy.

So the loss of egress affects both the runtime container and the build environment on this project. That rules out a bad image / corrupted runtime and points even more clearly at project-level network / NAT.

Please could you take a look?

Many thanks,

Peter

Solved

5 Replies

Status changed to Awaiting Railway Response Railway • 3 months ago

Anonymous

HOBBYOP

3 months ago

Follow-up: we've confirmed the egress problem is scoped to our old project, not Railway-wide.

We deployed the same code, same GitHub repo (type3audio/railway, branch main), into a brand-new Railway project in the same region (us-west2), with the same env vars. Everything works as expected there:

New project ID:906fcc7b-1887-445b-841b-435b00b905bf
New service ID:eff81940-1a0c-4ecc-9baa-91d029ef850e
New deploy URL:https://railway-type3-audio-failover-production.up.railway.app Verification from the new project:
Build phase reached archive.ubuntu.com on the first try and completed normally; apt-get worked fine.
GET /render?url=https://files.type3.audio/test/hello-world.html (with Authorization) → 200 in 3.8 s via the native Playwright path.

So: runtime egress is fine in the new project, and the build environment can reach external mirrors there too. Both of the failure modes we saw on the old project are absent.

The old project is still broken in the same way as described in the original ticket — both runtime and build egress time out for all external destinations.

Any idea what caused this issue on our old project? Can we fix the issue for that project? Is it likely to recur?

Thanks in advance.

brody

EMPLOYEE

3 months ago

For clarity, did you have static IPs enabled?

Status changed to Awaiting User Response Railway • 3 months ago

Anonymous

HOBBYOP

3 months ago

No, I'm on the Hobby plan.

Status changed to Awaiting Railway Response Railway • 3 months ago

Anonymous

HOBBYOP

3 months ago

Hmm we triggered another re-deploy on the old project and now it built as expected, and is running as expected. Did you change something on your side?

I didn't change anything on our end.

Strange.

(I guess you can see the previous failed deploys from earlier today—those build logs make the "unable to fetch from archive.ubuntu.com during build" issue clear.)

brody

EMPLOYEE

3 months ago

We didn't make any changes on our side. The redeploy likely landed your service on a different host, which cleared the issue. We don't have a clear explanation for why the old host's outbound networking was broken, and since it's resolved now there isn't much we can dig into. If it happens again, please let us know right away and we can investigate while it's still in a broken state.

Status changed to Awaiting User Response Railway • 3 months ago

Railway

BOT

2 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway • 2 months ago

Welcome!