Private Networking 502 Error - DNS Resolution Failure Between Services
dsinghc
PROOP

3 months ago

Hi Railway Support,

I'm experiencing a 502 Bad Gateway error when attempting to route traffic between services using Railway's private networking. This configuration was previously working and has stopped functioning without any changes on my end.

Setup:

  • Environment: Production

  • Service 1:cloudflared (Cloudflare Tunnel connector using cloudflare/cloudflared Docker image)

  • Service 2:internal-x (Next.js 16.1.1 application)

Configuration:

  • Cloudflare Tunnel is configured to route internal-x.abc.comhttp://internal-x.railway.internal:8080

  • Private networking is enabled on internal-x (confirmed in dashboard: "Ready to talk privately")

  • Both services are in the same project and environment

Current Behavior:

  • Accessing https://internal-x.abc.com returns a 502 Bad Gateway

  • The public Railway URL (https://internal-x-production.up.railway.app) works correctly

  • Cloudflared logs show successful tunnel registration (4 connections to IAD data centers)

  • Next.js logs show the app starts successfully: ✓ Ready in 739ms on port 8080

Troubleshooting Completed:

  1. Redeployed both services

  2. Regenerated and redeployed the Cloudflare tunnel token

  3. SSH'd into internal-x container — confirmed:

    • PORT=8080 environment variable is set

    • Next.js process is running (next-server v16.1.1)

    • App binds to network interface (logs show Network: http://10.x.x.x:8080)

  4. Attempted SSH into cloudflared container — container has no shell available

  5. Verified private networking is enabled in dashboard

Suspected Issue: The cloudflared container appears unable to resolve or connect to internal-x.railway.internal. Since I cannot get shell access to the cloudflared container to run DNS diagnostics (nslookup/curl), I cannot confirm whether this is a DNS resolution failure or a network connectivity issue.

Request: Could you please investigate whether there are any issues with private networking DNS resolution in my project? Additionally, if there have been any recent changes to Railway's internal networking that might affect .railway.internal hostname resolution, that information would be helpful.

Solved$20 Bounty

Pinned Solution

domehane
FREETop 5% Contributor

3 months ago

add this env var to your internal-design-system service:

HOSTNAME=::

this forces next.js to bind to ipv6 (:: means all ipv6 interfaces). right now it's probably binding to ipv4 only and cloudflared can't reach it over the ipv6-only private network

redeploy both services after adding that var

3 Replies

3 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody 3 months ago


domehane
FREETop 5% Contributor

3 months ago

add this env var to your internal-design-system service:

HOSTNAME=::

this forces next.js to bind to ipv6 (:: means all ipv6 interfaces). right now it's probably binding to ipv4 only and cloudflared can't reach it over the ipv6-only private network

redeploy both services after adding that var


dsinghc
PROOP

2 months ago

Thanks for the suggestions. After further investigation, I resolved this issue by restructuring my Cloudflare tunnel configuration.

The Problem:

I had a single cloudflared tunnel service with multiple application routes configured (routing to several different Railway services via private networking).

The Solution:

Created separate cloudflared tunnel services, each with its own tunnel token and a single application route. So instead of:

1 tunnel → multiple app routes (internal-x, service-b, service-c)

I now have:

tunnel-x → internal-x.railway.internal:8080

tunnel-service-b → service-b.railway.internal:3000

tunnel-service-c → service-c.railway.internal:4000

Why This Works Better:

1. Fault isolation - If one application is unavailable or experiencing issues, the other tunnels continue functioning independently. With the single-tunnel approach, one unhealthy route could affect connectivity to all other routes.

2. Easier debugging - Each tunnel's logs are specific to one service, making it straightforward to identify issues.

3. Independent deployments - Can redeploy/restart individual tunnel services without affecting others.

The HOSTNAME=:: suggestion for IPv6 binding is still worth noting for others - Railway's private network is IPv6-only, so Next.js apps should bind to :: rather than 0.0.0.0.


dsinghc

Thanks for the suggestions. After further investigation, I resolved this issue by restructuring my Cloudflare tunnel configuration.The Problem:I had a single cloudflared tunnel service with multiple application routes configured (routing to several different Railway services via private networking).The Solution:Created separate cloudflared tunnel services, each with its own tunnel token and a single application route. So instead of:1 tunnel → multiple app routes (internal-x, service-b, service-c)I now have:tunnel-x → internal-x.railway.internal:8080tunnel-service-b → service-b.railway.internal:3000tunnel-service-c → service-c.railway.internal:4000Why This Works Better:1. Fault isolation - If one application is unavailable or experiencing issues, the other tunnels continue functioning independently. With the single-tunnel approach, one unhealthy route could affect connectivity to all other routes.2. Easier debugging - Each tunnel's logs are specific to one service, making it straightforward to identify issues.3. Independent deployments - Can redeploy/restart individual tunnel services without affecting others.The HOSTNAME=:: suggestion for IPv6 binding is still worth noting for others - Railway's private network is IPv6-only, so Next.js apps should bind to :: rather than 0.0.0.0.

domehane
FREETop 5% Contributor

2 months ago

nice, the single tunnel → multiple routes setup definitely seems like it was the culprit. splitting them makes way more sense for fault isolation anyway


Status changed to Solved brody 2 months ago


Loading...