I appear to be experiencing some kind of internal railway DNS issue

Anonymous

PROOP

7 months ago

I recently experienced a weird issue where one container suddenly stopped talking to another via railways internal routing. The service was live and responding directly. Everything was set up correctly, but I was getting 502 errors via the internal route. Restarting/redeploying/ensuring everything build with no cache/prioritising ipv4 - nothing helped.

The only thing that worked was changing the subdomain prefix of the destination container/service. Immediately upon deployment of that change, it began to work again.

I've just had it happen again on different service - same 502 errors/timeout (same setup too - I have nginx proxying so that my frontend can make requests to the API via /api).

This 'scraper service' container is accessible directly on the public URL: https://scraper-production-ee61.up.railway.app/health

But it is not responding via the internal domain.

Log from the container attempting to access it show:

2025/12/01 21:07:26 [warn] 47#47: *1736 upstream server temporarily disabled while connecting to upstream, client: 100.64.0.7, server: localhost, request: "GET /api/performance/summary HTTP/1.1", upstream: "http://10.221.79.97:8000/api/performance/summary", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"

Any ideas? I'm going to try leaving it in the current state for you to inspect. I suspect renaming the internal domain prefix may work again to resolve it.

$30 Bounty

12 Replies

Railway

BOT

7 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!

brody

EMPLOYEE

7 months ago

What you’re running into is a common issue with Nginx and dynamic service IPs on Railway’s private network. By default, Nginx caches DNS lookups for the lifetime of a process, but Railway services receive new internal IPs on each deployment. This means Nginx may keep trying to connect to an outdated IP, resulting in 502 errors until you restart or redeploy.

The best way to handle this is to configure Nginx to resolve DNS more frequently. You’ll find plenty of guides online for setting this up, or you can check the Nginx resolver documentation for more details.

Status changed to Awaiting User Response Railway • 7 months ago

Anonymous

PROOP

7 months ago

Hi, thanks for the reply. However this appears to also happened to another frontend service that's running which doesn't use nginx.

The other is being proxied with node.js, which tends to respect the DNS TTLs.

Status changed to Awaiting Railway Response Railway • 7 months ago

Anonymous

PROOP

7 months ago

I updated nginx on the service that actually uses it, renamed the internal service subdomain, and redeployed. But still not working - and I've noticed even Qdrant is suffering from the issue - and this is a Railway-native container

Something definitely seems off with the networking here...

2025/12/03 22:45:09 [error] 34#34: *3 scraper-backend.railway.internal could not be resolved (3: Host not found), client: 100.64.0.2, server: localhost, request: "GET /api/performance/summary HTTP/1.1", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"

2025/12/03 22:45:09 [error] 33#33: *1 scraper-backend.railway.internal could not be resolved (3: Host not found), client: 100.64.0.2, server: localhost, request: "GET /api/runs/?page=1&page_size=50 HTTP/1.1", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"

100.64.0.2 - - [03/Dec/2025:22:45:09 +0000] "GET /api/performance/summary HTTP/1.1" 502 559 "https://scraper-frontend-production.up.railway.app/runs" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36" "45.146.8.28"

2025/12/03 22:45:10 [error] 33#33: *1 scraper-backend.railway.internal could not be resolved (3: Host not found), client: 100.64.0.2, server: localhost, request: "GET /api/performance/summary HTTP/1.1", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"

100.64.0.2 - - [03/Dec/2025:22:45:10 +0000] "GET /api/performance/summary HTTP/1.1" 502 559 "https://scraper-frontend-production.up.railway.app/runs" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36" "45.146.8.28"

2025/12/03 22:45:10 [error] 34#34: *3 scraper-backend.railway.internal could not be resolved (3: Host not found), client: 100.64.0.2, server: localhost, request: "GET /api/runs/?page=1&page_size=50 HTTP/1.1", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"

brody

EMPLOYEE

7 months ago

Hello,

I was able to nslookup scraper-backend.railway.internal from within the Scraper Frontend service just fine -

Server:		fd12::10
Address:	[fd12::10]:53

Name:	scraper-backend.railway.internal
Address: fd12:d782:f53c:1:1000:4d:4e66:6cd7

Name:	scraper-backend.railway.internal
Address: 10.230.108.215

So any issues resolving the DNS for the private domain would be within your application and, unfortunately, outside the scope of what we can help with here.

Best,

Brody

Status changed to Awaiting User Response Railway • 7 months ago

Anonymous

PROOP

7 months ago

Something seems messed up and it must be Railway. I have the IP resolving just fine too from Scraper Frontend - but it can't connect from just this machine - others on the internal network are just fine.

I'm not doing any funky network stuff with the container, it's just node:18-alpine.

SSH at Scraper Frontend

/ # nslookup scraper-backend.railway.internal

Server: fd12::10

Address: [fd12::10]:53

Name: scraper-backend.railway.internal

Address: fd12:d782:f53c:1:1000:4d:4e66:6cd7

Name: scraper-backend.railway.internal

Address: 10.230.108.215

/ # ping 10.230.108.215

PING 10.230.108.215 (10.230.108.215): 56 data bytes

SSH at Debugging frontend

/app # nslookup scraper-backend.railway.internal

Server: fd12::10

Address: [fd12::10]:53

Name: scraper-backend.railway.internal

Address: fd12:d782:f53c:1:1000:4d:4e66:6cd7

Name: scraper-backend.railway.internal

Address: 10.230.108.215

/app # ping scraper-backend.railway.internal

PING scraper-backend.railway.internal (10.230.108.215): 56 data bytes

64 bytes from 10.230.108.215: seq=0 ttl=42 time=0.492 ms

64 bytes from 10.230.108.215: seq=1 ttl=42 time=1.244 ms

64 bytes from 10.230.108.215: seq=2 ttl=42 time=0.623 ms

Status changed to Awaiting Railway Response Railway • 7 months ago

noahd

EMPLOYEE

7 months ago

Could you test something for me, in your frontend can you try referencing the public URL for your backend and not the private? If you do, does that work and resolve?

Status changed to Awaiting User Response Railway • 7 months ago

Anonymous

PROOP

7 months ago

On the box via ssh it works

/ # curl https://scraper-production-ee61.up.railway.app/health

{"status":"healthy","service":"uk-tax-scraper"}/

Struggling to have it work via the proxy though due to SSH but I suspect it would work find if I sorted that

2025/12/06 02:03:53 [error] 33#33: *1 SSL_do_handshake() failed (SSL: error:0A000438:SSL routines::tlsv1 alert internal error:SSL alert number 80) while SSL handshaking to upstream, client: 100.64.0.2, server: localhost, request: "GET /api/scrapers/ HTTP/1.1", upstream: "https://66.33.22.77:443/api/scrapers/", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/scrapers"

Status changed to Awaiting Railway Response Railway • 7 months ago

brody

EMPLOYEE

7 months ago

Noah and I have both independently confirmed that there are no issues with DNS or any private networking for your services.

Since this issue is not attributable to a platform or product issue, it would fall outside the scope of the support we offer.

I will go ahead and open up this ticket to the community so they can help you debug any issues with your project or configurations.

Status changed to Awaiting User Response Railway • 7 months ago

brody

EMPLOYEE

7 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody • 7 months ago

Anonymous

PROOP

7 months ago

Ok, but can you be a bit more helpful - I.e. explain your findings/why the container not being able to reach the other using the internal host/domain is not a railway issue?

On the box via ssh it works / # curl <https://scraper-production-ee61.up.railway.app/health> {"status":"healthy","service":"uk-tax-scraper"}/ Struggling to have it work via the proxy though due to SSH but I suspect it would work find if I sorted that 2025/12/06 02:03:53 \[error\] 33#33: \*1 SSL\_do\_handshake() failed (SSL: error:0A000438:SSL routines::tlsv1 alert internal error:SSL alert number 80) while SSL handshaking to upstream, client: 100.64.0.2, server: [localhost](http://localhost), request: "GET /api/scrapers/ HTTP/1.1", upstream: "<https://66.33.22.77:443/api/scrapers/>", host: "[scraper-frontend-production.up.railway.app](http://scraper-frontend-production.up.railway.app)", referrer: "<https://scraper-frontend-production.up.railway.app/scrapers>"

userdeh

PRO

7 months ago

based on this error: tlsv1 alert internal error

you are connecting to an upstream with wrong protocol.

use http for internal routing and if the upstream server is running behind the ssl certificate, use https as URL Schema

Anonymous

PROOP

7 months ago

You asked me to replace it with the external URL rather than the internal one, which runs TLS, so obviously it errored. That is not the problem.

Before I changed it as you requested, the internal routing WAS using http - the issue is this particular container is not connecting via the valid IP internally (no proxies, just a plain ping from the psudo railway SSH shell), but all the other containers are connecting just fine.

Welcome!