a month ago
Environment
Region: europe-west4
Runtime: V2
Metal Edge: Disabled
Problem
After deploying any service, inter-service communication via *.railway.internal times out. All services are healthy (receiving Railway health checks from 100.64.0.x), but cannot reach each other.
This is NOT DNS caching - DNS returns fresh IPs after redeploys, but connections still timeout.
Setup
service-a
service-a.railway.internal8000service-b
service-b.railway.internal8001service-c
service-c.railway.internal8080
Stack: Python 3.12, httpx 0.28.1, FastAPI, Uvicorn
All services bind to:0.0.0.0
How We Make Requests
import httpx
transport = httpx.AsyncHTTPTransport(local_address="0.0.0.0") # Force IPv4
async with httpx.AsyncClient(timeout=10.0, transport=transport) as client:
response = await client.get("http://service-b.railway.internal:8001/health")
# ^^^ Times out after 10 seconds
Evidence
service-b logs (healthy, receiving Railway health checks):
INFO: Uvicorn running on http://0.0.0.0:8001
INFO: 100.64.0.2:44241 - "GET /health HTTP/1.1" 200 OK
service-a logs (healthy, but cannot connect out):
INFO: 100.64.0.2:54803 - "GET /health HTTP/1.1" 200 OK <-- Railway check works
ERROR - service-b health check failed: Connection timeout (DNS: {
'resolved': ['fd12:xxxx:xxxx:x:xxxx:8001', '10.191.97.36:8001'],
'time_ms': 147.49, 'error': None
})
DNS resolves correctly to both IPv6 and IPv4. IPs update after redeploys. Connections still timeout.
What We Tried
Redeployed all services → Still broken
Disabled Metal Edge → Still broken
Verified fresh DNS IPs after redeploy → Still broken
Forced IPv4 binding → Still broken
Retry with exponential backoff → Still broken
Waited 15+ minutes → Still broken
Timeline Pattern
Services work fine
Deploy any service
Inter-service communication breaks immediately
All services show healthy in dashboard
Railway health checks succeed (from
100.64.0.x)*.railway.internalcalls timeout indefinitelyNo automatic recovery
Key Observation
Railway's health checks from 100.64.0.x reach all services. Only service-to-service communication fails. This suggests Wireguard mesh routing issue, not service configuration.
Questions
Known issues with private networking in europe-west4?
Way to force-refresh private network routing?
Should we try a different region?
Pinned Solution
a month ago
The issue is likely caused by a conflict between IPv6 internal DNS and your httpx forced IPv4 binding. If the DNS returns the IPv6 address, when httpx attempts to connect to this IPv6 address using a socket explicitly bound to 0.0.0.0 (IPv4), the connection fails.
Your logs show the IPv6 address is resolved first (['fd12:xxxx:xxxx:x:xxxx:8001', '10.191.97.36:8001']), so httpx defaults to it.
The fix:
you should stop forcing IPv4. Instead force or allow IPv6 so httpx can use the IPv6 address returned by the DNS. Change local_address="0.0.0.0" to local_address="::"
You might also need to update service binding. Binding to :: usually accepts both IPv6 and IPv4 traffic.
It would be something like this:uvicorn main:app --host :: --port 8001
2 Replies
a month ago
The issue is likely caused by a conflict between IPv6 internal DNS and your httpx forced IPv4 binding. If the DNS returns the IPv6 address, when httpx attempts to connect to this IPv6 address using a socket explicitly bound to 0.0.0.0 (IPv4), the connection fails.
Your logs show the IPv6 address is resolved first (['fd12:xxxx:xxxx:x:xxxx:8001', '10.191.97.36:8001']), so httpx defaults to it.
The fix:
you should stop forcing IPv4. Instead force or allow IPv6 so httpx can use the IPv6 address returned by the DNS. Change local_address="0.0.0.0" to local_address="::"
You might also need to update service binding. Binding to :: usually accepts both IPv6 and IPv4 traffic.
It would be something like this:uvicorn main:app --host :: --port 8001
a month ago
Confirmed, this was the issue.
Removing the forced IPv4 binding and allowing IPv6 resolved the inter-service communication immediately.
Changing local_address to :: and binding Uvicorn to --host :: fixed the timeouts. Everything is working as expected now.
Thanks for the clear explanation and quick help, @darseen 
Status changed to Solved brody • about 1 month ago