Outbound network (egress) completely broken for one service — :443 TCP timeouts to all external hosts
Anonymous
HOBBYOP

a month ago

Hi,

Since 2026-04-16 ~10:00 UTC our render service on project railway.type3.audio has been unable to make any outbound HTTPS request, causing ~100% of customer jobs to fail.

  • Project ID:d1b29819-2b0d-48bd-a758-50cf03a97b82
  • Environment:production (846fc2fb-9289-40eb-b5ae-014f046c9b23)
  • Service ID:698d9ad9-afc9-472e-a62b-d405521fec9c
  • Region: us-west2
  • Latest deploy:bf717273-57e1-4f74-b4d2-c1cbe04a51ed (fresh redeploy)

Symptom

Every outbound TCP :443 connection from the container hangs until timeout, regardless of destination. Inbound traffic is fine — GET /health returns 200 in <5 ms throughout.

Confirmed failing destinations (all unrelated IPs / providers):

  • example.com via Playwright → page.goto: Timeout 15000ms exceeded
  • api.scrape.do (163.172.169.229) → connect ETIMEDOUT
  • in.logtail.com (91.98.x.x, 91.99.x.x) → connect ETIMEDOUT Representative log line:
 [ERRO] Native crawler and scrapedo fallback both failed for
 https://example.com. nativeError=page.goto: Timeout 15000ms exceeded …
 fallbackReason=connect ETIMEDOUT 163.172.169.229:443

What we've tried

  1. deploymentRestart via the GraphQL API — no change.
  2. Full redeploy (railway redeploy) — new deployment bf717273…timed out in the build step at 20 min while fetching Ubuntu packages from archive.ubuntu.com. The container never got built, so the service kept serving the old (last-month) image.
  3. Another railway redeploy (bb49948f…) — same build-timeout behaviour.
  4. Reaching scrape.do and logtail from outside Railway — both work fine, so the destinations are healthy.

So the loss of egress affects both the runtime container and the build environment on this project. That rules out a bad image / corrupted runtime and points even more clearly at project-level network / NAT.

Please could you take a look?

Many thanks,

Peter

Solved

5 Replies

Status changed to Awaiting Railway Response Railway about 1 month ago


Anonymous
HOBBYOP

a month ago

Follow-up: we've confirmed the egress problem is scoped to our old project, not Railway-wide.

We deployed the same code, same GitHub repo (type3audio/railway, branch main), into a brand-new Railway project in the same region (us-west2), with the same env vars. Everything works as expected there:

  • New project ID:906fcc7b-1887-445b-841b-435b00b905bf
  • New service ID:eff81940-1a0c-4ecc-9baa-91d029ef850e
  • New deploy URL:https://railway-type3-audio-failover-production.up.railway.app Verification from the new project:
  • Build phase reached archive.ubuntu.com on the first try and completed normally; apt-get worked fine.
  • GET /render?url=https://files.type3.audio/test/hello-world.html (with Authorization) → 200 in 3.8 s via the native Playwright path.

So: runtime egress is fine in the new project, and the build environment can reach external mirrors there too. Both of the failure modes we saw on the old project are absent.

The old project is still broken in the same way as described in the original ticket — both runtime and build egress time out for all external destinations.

Any idea what caused this issue on our old project? Can we fix the issue for that project? Is it likely to recur?

Thanks in advance.


a month ago

For clarity, did you have static IPs enabled?


Status changed to Awaiting User Response Railway about 1 month ago


Anonymous
HOBBYOP

a month ago

No, I'm on the Hobby plan.


Status changed to Awaiting Railway Response Railway about 1 month ago


Anonymous
HOBBYOP

a month ago

Hmm we triggered another re-deploy on the old project and now it built as expected, and is running as expected. Did you change something on your side?

I didn't change anything on our end.

Strange.

(I guess you can see the previous failed deploys from earlier today—those build logs make the "unable to fetch from archive.ubuntu.com during build" issue clear.)


a month ago

We didn't make any changes on our side. The redeploy likely landed your service on a different host, which cleared the issue. We don't have a clear explanation for why the old host's outbound networking was broken, and since it's resolved now there isn't much we can dig into. If it happens again, please let us know right away and we can investigate while it's still in a broken state.


Status changed to Awaiting User Response Railway about 1 month ago


Railway
BOT

a month ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 28 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...