Private networking returns ECONNREFUSED between two services in same project despite all documented fixes

nmvalletta77

PROOP

2 months ago

Hi Railway team,

Two services in the same project + environment can't talk over private networking despite every documented config being in place. Public networking works fine; only the private hop

fails.

Services:

- Service A — Express app, listens on :::8080 (IPv6 dual-stack, confirmed at runtime via log listening on :::8080 (family=IPv6))

- Service B — Next.js sidecar that needs to fetch back to Service A

Symptom:

Service B → GET http://service-a.railway.internal:8080/... returns ECONNREFUSED every time. DNS resolves fine (no ENOTFOUND). Port 8080 is the one the server is listening on. The reverse

direction works: Service A → service-b.railway.internal:8080 via http-proxy succeeds.

What we've tried:

1. Service A binding 0.0.0.0 → ECONNREFUSED

2. Service A binding :: (dual-stack) → ECONNREFUSED

3. ipv6EgressEnabled=true on BOTH services → ECONNREFUSED

4. NODE_OPTIONS=--dns-result-order=ipv4first on the calling service → ECONNREFUSED

5. Short hostname service-a vs FQDN service-a.railway.internal → same ECONNREFUSED either way

Control that works: setting the sidecar's internal-API URL to the public edge URL (https://...) — the sidecar successfully fetches via the public route, so it's not a Node/code issue.

It's specifically the private-network hop.

Note: this project predates the Oct 2025 IPv4 private-networking rollout (legacy IPv6-only era), which might be relevant.

Ask: can you check the private-network registration for Service A? Either the VRF isn't routing correctly or it's forwarding to the wrong internal port. Happy to share project/service

IDs privately once a ticket is assigned, or open a live debug window.

Thanks

Closed$20 Bounty

2 Replies

Railway

BOT

2 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway • 2 months ago

nmvalletta77

PROOP

2 months ago

Bumping with fresh evidence captured today (2026-04-29) — same project, same services. Same ECONNREFUSED on the legacy hostname, plus confirmation that the new hostname doesn't resolve at all:

Test 1: INTERNAL_API_URL=http://byson-server.railway.internal:8080 (legacy name from before the service rename)

[serverMetadata] agent/byson-real-estate fetch failed: fetch failed

cause=[ECONNREFUSED]

(http://byson-server.railway.internal:8080/api/metadata/agent/byson-real-estate)

Test 2: INTERNAL_API_URL=http://tortus-server.railway.internal:8080 (current name)

[serverMetadata] agent/nick-valletta fetch failed: fetch failed

cause=getaddrinfo ENOTFOUND tortus-server.railway.internal[ENOTFOUND]

(http://tortus-server.railway.internal:8080/api/metadata/agent/nick-valletta)

Both halves of the same bug:

- Legacy hostname: DNS resolves, but no routing entry → ECONNREFUSED

- Current hostname: not registered in DNS at all → ENOTFOUND

The service was renamed from byson-server to tortus-server at some point. The dashboard label updated, but the internal-networking control plane retained the old name's DNS without its routing, and never

registered the new name. This is the stale-registration / VRF-gap I described in the original report.

Ask: can a Railway engineer please:

1. Re-register service cca7ab1e-bb7b-4de1-a32a-e7ccad7133b3 (project b7c836ae-0414-4a42-96b8-7be449b555c4) in the private-networking layer under its current name tortus-server, AND

2. Garbage-collect the stale byson-server.railway.internal DNS entry?

Public-URL workaround is still in place — production is healthy, but every SSR fetch costs ~50-200ms RTT through the public edge instead of ~5ms internally. We have measured ~750ms regression on high-fanout

routes (20-fetch fan-out hits the 1500ms ssrFetchAll cap because of this), and have had to narrow our SSR scope to just 3 routes as a tactical workaround until the private network is restored.

Happy to share more diagnostic data, open a live debug window, or anything else that helps.

nmvalletta77

PROOP

2 months ago

One more datapoint to make this unambiguous: I just ran the control test that darseen suggested — fresh Railway project (private-net-test, project ID 17d508bd-5edd-42b8-80cc-3e5405715bbd), two new services in

it, identical pattern (one Express-style HTTP server binding :::8080, one Node sidecar fetching it via service-a.railway.internal:8080).

Fresh project result:

{

"dns": { "ok": true, "type": "AAAA",

"values": ["fd12:383c:c461:1:8000:11b:91be:28e8"], "ms": 4 },

"privateFetch": { "ok": true, "status": 200, "ms": 10 },

"publicFetch": { "ok": true, "status": 200, "ms": 27 }

}

Private-network DNS resolves AAAA in 4ms, fetch round-trip in 10ms. This is exactly what's expected and what our prod project should be doing.

So:

- Fresh project = works perfectly, no special config needed.

- Our existing project = ECONNREFUSED on legacy hostname byson-server.railway.internal, ENOTFOUND on current hostname tortus-server.railway.internal. Same project, same docs followed, same Node version, same

:::8080 binding.

The only material difference between the two projects is age (existing project predates the Oct 2025 IPv4 private-networking rollout) and the fact that our service was renamed at some point. This rules out

every generic-Railway-bug suspect and confirms the rename / legacy-state-registration theory described in the original report.

Please look at the control-plane state for service cca7ab1e-bb7b-4de1-a32a-e7ccad7133b3 in project b7c836ae-0414-4a42-96b8-7be449b555c4. Whatever migration step normally registers a service in the

post-Oct-2025 networking layer didn't run for this one (or got stale post-rename). Re-running it should resolve.

Status changed to Closed brody • 2 months ago

Welcome!