4 months ago
I recently experienced a weird issue where one container suddenly stopped talking to another via railways internal routing. The service was live and responding directly. Everything was set up correctly, but I was getting 502 errors via the internal route. Restarting/redeploying/ensuring everything build with no cache/prioritising ipv4 - nothing helped.
The only thing that worked was changing the subdomain prefix of the destination container/service. Immediately upon deployment of that change, it began to work again.
I've just had it happen again on different service - same 502 errors/timeout (same setup too - I have nginx proxying so that my frontend can make requests to the API via /api).
This 'scraper service' container is accessible directly on the public URL: https://scraper-production-ee61.up.railway.app/health
But it is not responding via the internal domain.
Log from the container attempting to access it show:
2025/12/01 21:07:26 [warn] 47#47: *1736 upstream server temporarily disabled while connecting to upstream, client: 100.64.0.7, server: localhost, request: "GET /api/performance/summary HTTP/1.1", upstream: "http://10.221.79.97:8000/api/performance/summary", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"
Any ideas? I'm going to try leaving it in the current state for you to inspect. I suspect renaming the internal domain prefix may work again to resolve it.
12 Replies
4 months ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
4 months ago
What you’re running into is a common issue with Nginx and dynamic service IPs on Railway’s private network. By default, Nginx caches DNS lookups for the lifetime of a process, but Railway services receive new internal IPs on each deployment. This means Nginx may keep trying to connect to an outdated IP, resulting in 502 errors until you restart or redeploy.
The best way to handle this is to configure Nginx to resolve DNS more frequently. You’ll find plenty of guides online for setting this up, or you can check the Nginx resolver documentation for more details.
Status changed to Awaiting User Response Railway • 4 months ago
3 months ago
Hi, thanks for the reply. However this appears to also happened to another frontend service that's running which doesn't use nginx.
The other is being proxied with node.js, which tends to respect the DNS TTLs.
Status changed to Awaiting Railway Response Railway • 3 months ago
3 months ago
I updated nginx on the service that actually uses it, renamed the internal service subdomain, and redeployed. But still not working - and I've noticed even Qdrant is suffering from the issue - and this is a Railway-native container
Something definitely seems off with the networking here...
2025/12/03 22:45:09 [error] 34#34: *3 scraper-backend.railway.internal could not be resolved (3: Host not found), client: 100.64.0.2, server: localhost, request: "GET /api/performance/summary HTTP/1.1", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"
2025/12/03 22:45:09 [error] 33#33: *1 scraper-backend.railway.internal could not be resolved (3: Host not found), client: 100.64.0.2, server: localhost, request: "GET /api/runs/?page=1&page_size=50 HTTP/1.1", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"
100.64.0.2 - - [03/Dec/2025:22:45:09 +0000] "GET /api/performance/summary HTTP/1.1" 502 559 "https://scraper-frontend-production.up.railway.app/runs" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36" "45.146.8.28"
2025/12/03 22:45:10 [error] 33#33: *1 scraper-backend.railway.internal could not be resolved (3: Host not found), client: 100.64.0.2, server: localhost, request: "GET /api/performance/summary HTTP/1.1", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"
100.64.0.2 - - [03/Dec/2025:22:45:10 +0000] "GET /api/performance/summary HTTP/1.1" 502 559 "https://scraper-frontend-production.up.railway.app/runs" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36" "45.146.8.28"
2025/12/03 22:45:10 [error] 34#34: *3 scraper-backend.railway.internal could not be resolved (3: Host not found), client: 100.64.0.2, server: localhost, request: "GET /api/runs/?page=1&page_size=50 HTTP/1.1", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/runs"
3 months ago
Hello,
I was able to nslookup scraper-backend.railway.internal from within the Scraper Frontend service just fine -
Server: fd12::10
Address: [fd12::10]:53
Name: scraper-backend.railway.internal
Address: fd12:d782:f53c:1:1000:4d:4e66:6cd7
Name: scraper-backend.railway.internal
Address: 10.230.108.215So any issues resolving the DNS for the private domain would be within your application and, unfortunately, outside the scope of what we can help with here.
Best,
Brody
Status changed to Awaiting User Response Railway • 3 months ago
3 months ago
Something seems messed up and it must be Railway. I have the IP resolving just fine too from Scraper Frontend - but it can't connect from just this machine - others on the internal network are just fine.
I'm not doing any funky network stuff with the container, it's just node:18-alpine.
SSH at Scraper Frontend
/ # nslookup scraper-backend.railway.internal
Server: fd12::10
Address: [fd12::10]:53
Name: scraper-backend.railway.internal
Address: fd12:d782:f53c:1:1000:4d:4e66:6cd7
Name: scraper-backend.railway.internal
Address: 10.230.108.215
/ # ping 10.230.108.215
PING 10.230.108.215 (10.230.108.215): 56 data bytes
<no ping>
SSH at Debugging frontend
/app # nslookup scraper-backend.railway.internal
Server: fd12::10
Address: [fd12::10]:53
Name: scraper-backend.railway.internal
Address: fd12:d782:f53c:1:1000:4d:4e66:6cd7
Name: scraper-backend.railway.internal
Address: 10.230.108.215
/app # ping scraper-backend.railway.internal
PING scraper-backend.railway.internal (10.230.108.215): 56 data bytes
64 bytes from 10.230.108.215: seq=0 ttl=42 time=0.492 ms
64 bytes from 10.230.108.215: seq=1 ttl=42 time=1.244 ms
64 bytes from 10.230.108.215: seq=2 ttl=42 time=0.623 ms
Status changed to Awaiting Railway Response Railway • 3 months ago
3 months ago
Could you test something for me, in your frontend can you try referencing the public URL for your backend and not the private? If you do, does that work and resolve?
Status changed to Awaiting User Response Railway • 3 months ago
3 months ago
On the box via ssh it works
/ # curl https://scraper-production-ee61.up.railway.app/health
{"status":"healthy","service":"uk-tax-scraper"}/
Struggling to have it work via the proxy though due to SSH but I suspect it would work find if I sorted that
2025/12/06 02:03:53 [error] 33#33: *1 SSL_do_handshake() failed (SSL: error:0A000438:SSL routines::tlsv1 alert internal error:SSL alert number 80) while SSL handshaking to upstream, client: 100.64.0.2, server: localhost, request: "GET /api/scrapers/ HTTP/1.1", upstream: "https://66.33.22.77:443/api/scrapers/", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/scrapers"
Status changed to Awaiting Railway Response Railway • 3 months ago
3 months ago
Noah and I have both independently confirmed that there are no issues with DNS or any private networking for your services.
Since this issue is not attributable to a platform or product issue, it would fall outside the scope of the support we offer.
I will go ahead and open up this ticket to the community so they can help you debug any issues with your project or configurations.
Status changed to Awaiting User Response Railway • 3 months ago
3 months ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open brody • 3 months ago
3 months ago
Ok, but can you be a bit more helpful - I.e. explain your findings/why the container not being able to reach the other using the internal host/domain is not a railway issue?
On the box via ssh it works/ # curl https://scraper-production-ee61.up.railway.app/health{"status":"healthy","service":"uk-tax-scraper"}/ Struggling to have it work via the proxy though due to SSH but I suspect it would work find if I sorted that2025/12/06 02:03:53 [error] 33#33: *1 SSL_do_handshake() failed (SSL: error:0A000438:SSL routines::tlsv1 alert internal error:SSL alert number 80) while SSL handshaking to upstream, client: 100.64.0.2, server: localhost, request: "GET /api/scrapers/ HTTP/1.1", upstream: "https://66.33.22.77:443/api/scrapers/", host: "scraper-frontend-production.up.railway.app", referrer: "https://scraper-frontend-production.up.railway.app/scrapers"
3 months ago
based on this error: tlsv1 alert internal error
you are connecting to an upstream with wrong protocol.
use http for internal routing and if the upstream server is running behind the ssl certificate, use https as URL Schema
3 months ago
You asked me to replace it with the external URL rather than the internal one, which runs TLS, so obviously it errored. That is not the problem.
Before I changed it as you requested, the internal routing WAS using http - the issue is this particular container is not connecting via the valid IP internally (no proxies, just a plain ping from the psudo railway SSH shell), but all the other containers are connecting just fine.