Private networking completely broken between services after deployment - connection timeouts despite healthy services

Question

## Environment

* **Region:** europe-west4
* **Runtime:** V2
* **Metal Edge:** Disabled

## Problem

After deploying any service, inter-service communication via `*.railway.internal` times out. All services are healthy (receiving Railway health checks from `100.64.0.x`), but cannot reach each other.

**This is NOT DNS caching** \- DNS returns fresh IPs after redeploys, but connections still timeout.

## Setup

* service-a `service-a.railway.internal` 8000
* service-b `service-b.railway.internal` 8001
* service-c `service-c.railway.internal` 8080

**Stack:** Python 3.12, httpx 0.28.1, FastAPI, Uvicorn  
**All services bind to:**`0.0.0.0`

## How We Make Requests

`import httpx`

`transport = httpx.AsyncHTTPTransport(local_address="0.0.0.0") # Force IPv4`

`async with httpx.AsyncClient(timeout=10.0, transport=transport) as client:`

` response = await client.get("http://service-b.railway.internal:8001/health")`

` # ^^^ Times out after 10 seconds`

## Evidence

**service-b logs (healthy, receiving Railway health checks):**

`INFO: Uvicorn running on http://0.0.0.0:8001`

`INFO: 100.64.0.2:44241 - "GET /health HTTP/1.1" 200 OK`

**service-a logs (healthy, but cannot connect out):**

`INFO: 100.64.0.2:54803 - "GET /health HTTP/1.1" 200 OK <-- Railway check works`

`ERROR - service-b health check failed: Connection timeout (DNS: {`

` 'resolved': ['fd12:xxxx:xxxx:x:xxxx:8001', '10.191.97.36:8001'],`

` 'time_ms': 147.49, 'error': None`

`})`

DNS resolves correctly to both IPv6 and IPv4\. IPs update after redeploys. Connections still timeout.

## What We Tried

* Redeployed all services → Still broken
* Disabled Metal Edge → Still broken
* Verified fresh DNS IPs after redeploy → Still broken
* Forced IPv4 binding → Still broken
* Retry with exponential backoff → Still broken
* Waited 15+ minutes → Still broken

## Timeline Pattern

1. Services work fine
2. Deploy any service
3. Inter-service communication breaks immediately
4. All services show healthy in dashboard
5. Railway health checks succeed (from `100.64.0.x`)
6. `*.railway.internal` calls timeout indefinitely
7. No automatic recovery

## Key Observation

Railway's health checks from `100.64.0.x` reach all services. Only service-to-service communication fails. This suggests Wireguard mesh routing issue, not service configuration.

## Questions

1. Known issues with private networking in europe-west4?
2. Way to force-refresh private network routing?
3. Should we try a different region?

darseen · Accepted Answer

The issue is likely caused by a conflict between **IPv6 internal DNS** and your `httpx` **forced IPv4 binding**. If the DNS returns the IPv6 address, when `httpx` attempts to connect to this IPv6 address using a socket explicitly bound to `0.0.0.0` (IPv4), the connection fails.  
  
Your logs show the IPv6 address is resolved first (`['fd12:xxxx:xxxx:x:xxxx:8001', '10.191.97.36:8001']`), so `httpx` defaults to it.  
  
The fix:  
you should stop forcing IPv4\. Instead force or allow IPv6 so `httpx` can use the **IPv6** address returned by the DNS. Change `local_address="0.0.0.0"` to `local_address="::"`  
  
You might also need to update service binding. Binding to `::` usually accepts both IPv6 and IPv4 traffic.   
It would be something like this:`uvicorn main:app --host :: --port 8001`