Browser hangs on DNS resolving
ricardovanlaarhoven
PROOP

3 months ago

Hi,

I hope i have my title correct, i'm not a DNS expert, but this is wat whe notice.

Sometimes when we go to test.manage.debitroom.com (also on other envs) the browser hangs and won't go to the url. When this is happening, and you do a mx lookup, this is the result:

image.png

When you refresh the page a few moments later, it works normally and this is the mx result:

Our DNS provider has already checked if there are any issues on their side.

We think this is a railway issue.

At this moment, i'm trying to create the custom domain again and add it to the dns for the test.manage.debitroom.com hopefully that works, but then i would like to know why that works.

accept.manage.debitroom.com and manage.debitrooom.com are our other envs and have the same issue.

$30 Bounty

28 Replies

echohack
EMPLOYEE

3 months ago

Hi there,

We support CNAME records from your domain provider using CNAME Flattening. If your provider does not support CNAME flattening, that might explain why you're seeing issues.

I took at look at your project and at present everything looks like it is in a good state, and the setup is complete for your test.manage.debitroom.com domain.

If you haven't seen it already, here are the docs on Public Domains on Railway: https://docs.railway.com/guides/public-networking#custom-domains


Status changed to Awaiting User Response Railway 3 months ago


ricardovanlaarhoven
PROOP

3 months ago

As far as our DNS partner can tell, we have CNAME Flattening.

But with a bit of googling, i found that this behaviour could be the browser trying to find AAAA first. And I see that our custom domain's cname do not have an AAAA record

  1. Could this be an issue?

  2. Why doesn't railway provide a AAAA record on the cname?

Attachments


Status changed to Awaiting Railway Response Railway 3 months ago


3 months ago

This shouldn't be an issue as the fallback is instant, but yes, our edge is currently ipv4 only


Status changed to Awaiting User Response Railway 3 months ago


ricardovanlaarhoven
PROOP

3 months ago

As you're saying edge.
I see that our test env is on edge DNS within railway. And our production is not.

we think we have seen this issue just once on production, but we're not sure. On the test env we see this all the time.
Could the issue be in this difference?

Is there a way to downgrade on our test env, to test this?


Status changed to Awaiting Railway Response Railway 3 months ago


3 months ago

Regardless of what the UI might show, we only have one kind of edge network running, and that is our anycast metal edge. There is no downgrading to anything here.


Status changed to Awaiting User Response Railway 3 months ago


ricardovanlaarhoven
PROOP

3 months ago

Hi,
I've tried removing the cnames and generating new ones.
I've tried removing the cnames and just create an A record to 66.33.22.166 on our test env, to see if there is something with the cnames.

But we're facing the same problem. The browser shows the same page as you were on when you typed test.manage.debitroom.com and it keeps loading indefinetly.
When opening a second tab after a minute or so, it loads in less then a second and shows our platform. But sometimes when opening a new tab again, that tab fails to load as well. (just like i did this morning around 7:57 CET)
At the same time, another service like test.mijn.debitroom.com does load

How can you make sure this is not a railway issue?
Am i right to think this is a DNS issue? Since simply the page won't load i don't see any html and i don't see anything in our railway/caddy log.

We could migrate to cloudflare, but if you recommend that, what indicates that that would resolve the issue?

Attachments


Status changed to Awaiting Railway Response Railway 3 months ago


3 months ago

test.manage.debitroom.com loads instantly for me in Chrome, so this isn't a Railway issue. I will open this up to the community so they can help you debug.


Status changed to Awaiting User Response Railway 3 months ago


3 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody 3 months ago


ricardovanlaarhoven

Hi,I've tried removing the cnames and generating new ones.I've tried removing the cnames and just create an A record to 66.33.22.166 on our test env, to see if there is something with the cnames.But we're facing the same problem. The browser shows the same page as you were on when you typed test.manage.debitroom.com and it keeps loading indefinetly.When opening a second tab after a minute or so, it loads in less then a second and shows our platform. But sometimes when opening a new tab again, that tab fails to load as well. (just like i did this morning around 7:57 CET)At the same time, another service like test.mijn.debitroom.com does loadHow can you make sure this is not a railway issue?Am i right to think this is a DNS issue? Since simply the page won't load i don't see any html and i don't see anything in our railway/caddy log.We could migrate to cloudflare, but if you recommend that, what indicates that that would resolve the issue?

You should not be using A records when using custom domains.

Additionally, your domain loads in my browser.

Have you tried accessing your URL via incognito browser or a different network to ensure it's not related to firewalls or custom DNS?


ricardovanlaarhoven
PROOP

3 months ago

The a records are just a trail run to check if for example cname flattening is not an issue, i'll set back the cnames today since that didn't resolve it.

we've tried incognito and multiple networks. It's on all our env's.

About 5 internal users (coworkers) i know of have had this problem while working home.


ricardovanlaarhoven
PROOP

3 months ago

Hi everyone,

Small update on this issue. After analyzing a Chrome NetLog during a "hang", I noticed that the browser spends a significant amount of time in the HOST_RESOLVER_IMPL_JOB phase. It seems to be querying for both A and HTTPS records, which leads to a delay when resolving the CNAME alias szui5roo.up.railway.app.

Since the issue persists even with direct A records and across different networks, I suspect the combination of missing AAAA (IPv6) records on the Railway edge and how certain ISP resolvers handle the HTTPS record type is causing the browser to "stall".

Next steps: I am going to migrate the DNS management to Cloudflare and later enable the Proxy (Orange Cloud).

This should:

  1. Provide native AAAA and HTTPS record responses directly from the Cloudflare edge.

  2. Eliminate the nested CNAME resolution chain for the browser.

  3. Improve routing via Cloudflare’s anycast network.

I will report back in a few days to let you know if this permanently resolves the intermittent "indefinite loading" issues for our team.


ricardovanlaarhoven
PROOP

2 months ago

Hi Railway team,

After deeper analysis of Chrome NetLogs and testing with a Cloudflare proxy, we have identified a critical failure in how Railway's Edge interacts with modern browsers.

The Issue: The browser hangs indefinitely when resolving the domain. The page never loads unless multiple new tabs are opened to force something i guess..

  1. Chrome Deadlock: Due to recent updates in Chrome's network stack (Happy Eyeballs v3 logic), the browser wait-queue for this domain becomes blocked. Since the Edge doesn't provide an IPv6 path, and the resolver doesn't fail gracefully, the request stays "active" forever.

  2. The Proof: * Direct to Railway (test.manage): Infinite hang.

    • Via Cloudflare Proxy (test-manage): Instant load.

    • Cloudflare solves this because it provides an immediate AAAA response, preventing the browser's DNS queue from stalling.

Why this is a Railway issue: While we can use Cloudflare as a band-aid, Railway's Edge should ideally:

  • Support Dual-Stack (IPv6).

  • OR: Ensure that the Anycast infrastructure correctly triggers an immediate NOERROR (empty) response for AAAA queries so the browser can instantly fall back to IPv4.

This is causing a major impact on our internal and external users. Could someone from the networking team look into the AAAA handling of the Anycast Edge?


ricardovanlaarhoven
PROOP

2 months ago

As an extra, i know that when we're creating an apple app, they require ipv6 to test the app. See: https://developer.apple.com/support/ipv6/

So at this point with railway you can't host an iOs app.


ricardovanlaarhoven
PROOP

2 months ago

Hi Railway team,

I have an important update. This morning, we experienced the same "infinite hang" even on the domain proxied through Cloudflare (test-manage).

This changes the diagnostic:

  • Not DNS: Since Cloudflare handles the DNS resolution, and previous we had another party with the same issue

  • Not AAAA: Since we tried Cloudflare proxy

  • Not Application: The request still hasn't reached our origin server logs when the hang occurs.

  • Network/Edge Issue: The problem must lie in the connection phase between the client and the Railway Edge.

i've also tried to disable "Experimental QUIC protocol" with no success


ricardovanlaarhoven
PROOP

2 months ago

I noticed caddy server responds sometimes with a 206 in combination with brotli compression which is fixed in 2.11. I installed this version and 206's have dissapeard.
However the main issue still proceeds.


ilyassbreth
FREE

2 months ago

i'm still trying to solve this


ricardovanlaarhoven
PROOP

2 months ago

Hi,

Thank you for your time and effort!

  • How do you know about the known issues? And how to downgrade? As @Brody commented, downgrading is no longer an option. (you mentioned issues with railways edge network, except I see you edited the answer and removed it)

  • Where do you see these bad glue records? Since we switched to Cloudflare, I don't see these anymore on https://mxtoolbox.com/.

  • Thank you for the acknowledgement, I hope Railway plans to implement this asap.


ricardovanlaarhoven
PROOP

2 months ago

i'm still trying to solve this

@ilyassbreth

Do you see the same issue?
We have multiple computers and multiple networks that have this issue, but there has never been a confirmation from railway that they saw this.
So I think it would be great to have a confirmation from outside our organization.

i can sometimes reproduce it in a incognito tab and an `ipconfig/flushdns` but this doesn't work if you have your developer tools open for some reason


2 months ago

We see this and have determined that the problem you are facing is not an issue with our platform or product, which is why we have moved this to community support so they can help determine issues with your setup and/or application.


ricardovanlaarhoven
PROOP

2 months ago

I found something new, i tried this at a time no one else uses the platform, and i've found a request that works and is visible in the log:

requestId: "FoFtKFTZSuGUo3O1w9P4nw"
timestamp: "2026-01-07T06:37:54.972833052Z"
method:"GET"
path: "/"
host:"test.manage.debitroom.com"
httpStatus:200
upstreamProto:"HTTP/1.1"
downstreamProto:"HTTP/2.0"
responseDetails: ""
totalDuration: 12
upstreamAddress: "http://[fd12:7410:7673:0:4000:22:52ff:6de8]:8080"
clientUa: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36"
upstreamRqDuration: 12
txBytes: 375
rxBytes: 801
srcIp: "*******myip"
edgeRegion: "europe-west4-drams3a"
upstreamErrors:""

and a request that fails and is visible in the logs, this is in an incognito tab after a flushdns

requestId: "WD27MJm3SqiEBkg3N8N_Fg"
timestamp: "2026-01-07T06:38:16.334245858Z"
method: "GET"
path: "/"
host: "test.manage.debitroom.com"
httpStatus: 200
upstreamProto: "HTTP/1.1"
downstreamProto: "HTTP/2.0"
responseDetails: ""
totalDuration: 4
upstreamAddress: "http://[fd12:7410:7673:0:4000:22:52ff:6de8]:8080"
clientUa: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36"
upstreamRqDuration: 4
txBytes: 375
rxBytes: 801
srcIp: "********myip"
edgeRegion: "europe-west4-drams3a"
upstreamErrors: ""

Even weirder is that the assets are loaded as well. But when i do a ctrl+u (show sourcecode) in the browser i still see the source code of the previous page, the chrome incognito startup page

as far as i can tell they are prefetch;prerender assets. Sometimes i only see some favicons/ chromeicons and nothing more.


brody

We see this and have determined that the problem you are facing is not an issue with our platform or product, which is why we have moved this to community support so they can help determine issues with your setup and/or application.

ricardovanlaarhoven
PROOP

2 months ago

Hi Brody,

How do you know?

With all due respect, this is not the first time I’ve had to persist with Railway support regarding a platform issue where I was eventually proven correct (including a previous minor security issue). I want to ensure this is taken seriously and escalated to a networking engineer who can look at the Edge Proxy/Load Balancer logs for "dropped segments" or "incomplete streams," rather than just checking if the domain is reachable from a single location.

@Brody, can you check the logs around that time for my ip? You must have a log on the load balancer or edge proxy.
Perhaps for TCP segments that are dropped? or incomplete streams?


ricardovanlaarhoven
PROOP

2 months ago

Hi @brody and Railway team,

I’m reaching out again because we are at a standstill. I’ve provided logs from Jan 7th (requestId: WD27MJm3SqiEBkg3N8N_Fg) which prove that the request is reaching Railway’s edge and returning a 200 status, yet the browser continues to hang.

I had attached a screenshot of our Caddy logs during one of these 'hangs'. It proves that:

  • Caddy successfully processed the request for / and all associated assets (JS, CSS, SVG) with a 200 OK status.

  • These were served in milliseconds, yet the browser stayed on a blank loading screen.

This rules out a simple 'DNS setup issue' or an application error. It points toward a deeper problem in how the Edge Proxy/Load Balancer handles the stream or connection termination with specific clients. While we are focusing on Chrome because it is most reproducible there, we cannot rule out that this affects other browsers as well, but we haven't seen this on edge nor firefox.

Could this be escalated to a networking engineer to specifically look for:

  1. TCP segment drops or incomplete streams at the Edge for the request IDs provided?

  2. Potential issues with HTTP/2 stream multiplexing or window sizes on the Anycast network that might cause this 'deadlock'?

We are so committed to resolving this that we would like to know if a Railway networking expert could look into this together with us on a paid basis. If there is any chance the issue originates from our specific configuration, we are more than willing to cover the hourly costs for this deep-dive. Could you let us know the rates for such expert assistance?

We really love the Railway developer experience, but without a technical investigation into why caddy is reporting success while the client is hanging, we are forced to look at alternative hosting providers to ensure it's not a Railway platform issue.

I would appreciate a more in-depth look at the Edge/Proxy behavior for the timestamps mentioned.

Thank you.


ricardovanlaarhoven
PROOP

2 months ago

Hi @brody and Railway team,

We have an important update regarding the 'infinite hang' issue. After testing across three different workstations, we have isolated a specific trigger and a potential path forward.

Key Findings:

  • Chrome Flags: The issue is 100% mitigated by disabling the #prerender2 flag in Google Chrome. When this flag is active (default), the site hangs even though Caddy logs show all assets were successfully served with 200 OK statuses.

  • Isolation: Since our HTML does not contain any explicit prefetch/prerender tags, this confirms that the Railway Edge is struggling with Chrome's internal speculative networking/HTTP2 multiplexing.

Our Next Steps: As a workaround, we are going to modify our build process:

  1. We are disabling pre-compression (Brotli/Gzip) in Vite.

  2. We are moving the encoding responsibility entirely to Caddy (using encode zstd gzip).

Our theory is that the Edge Proxy may be failing to correctly terminate or flush streams when serving static .br files during a Chrome 'speculative load'. By letting Caddy handle the compression dynamically, we hope to provide a more 'proxy-friendly' stream.

Request to Railway: While we are testing this workaround, we still need your team to investigate why the Edge Proxy is incompatible with default Chrome 143 behavior. We shouldn't have to ask users to change browser flags or avoid standard build optimizations to remain stable on the platform.

We remain open to the paid expert consultation if your networking team needs to deep-dive into the Edge behavior with us.

Thank you.

update: We verified the encoding is now dynamic Zstd/Gzip served by Caddy, yet the hang persists


ricardovanlaarhoven
PROOP

2 months ago

Hi,

Adding headers like:

header Speculation-Rules "[]" header No-Vary-Search "params=()" header Cache-Control "no-store, no-cache, must-revalidate"

won't work.

I've confirmed and saw them in the hanging request like wl9XtFp2SiC58SLcjUJq2g


2 months ago

I'm sorry but at this time, I cannot see how this is an issue with our platform.

I am unable to reproduce it on stock Chrome 143 on a Mac when visiting your site every time I've come into this thread, and no other customer has reported anything similar in the month this thread has been opened.

But if you have an MRE that causes responses to hang every time on Chrome, please provide one so we have something concrete to go off of here.


ricardovanlaarhoven
PROOP

2 months ago

Hi @brody,

To be honest, the responses so far have been very vague. While you’ve stated that "it works for me," we haven't received any confirmation or feedback on what Railway actually sees in its own infrastructure logs when these hangs occur.

We are seeing this on multiple independent Vue front-ends across multiple different networks. Since this is an intermittent network-level issue, we think that a simple "reproducible code example" isn't possible. However, the data in our Railway Caddy logs is very specific, and we need you to cross-reference it with your edge logs.

Direct questions for the networking team:

  • Visibility: When we see a request for / followed by requests for assets (index.js, etc.) all logged as 200 OK in our Caddy logs, what does Railway see at the Edge/Proxy layer? Does your infrastructure show these packets being fully acknowledged (ACK) by the client?

  • Edge Errors: Have you checked the logs of the Anycast nodes in the europe-west4 region for TCP retransmissions, window stalls, or HTTP/2 stream resets for our domain during these timestamps?

  • The Deadlock: How is it possible for asset requests to reach our Caddy logs if the initial HTML delivery (according to the browser) never completed? This points to a failure in the transport layer that only your team can see.

Regarding the support process: We feel that our technical observations are being bypassed. This is not the first time I’ve had to push for a deeper investigation at Railway only to be proven correct later (including a previous security issue). We aren't looking for a "it works for me" confirmation; we are looking for an engineer to explain why your Edge reports a "Success" for data that is clearly not reaching the browser.

Please provide some technical transparency on what your internal monitoring shows for these specific Request IDs. Furthermore, we have repeatedly asked if a paid consultation is possible, so that if the problem is indeed on our end, you can prove it and your time is compensated. but we have never received a response to that offer.

Because we do not feel taken seriously in our relationship as a paying customer, I have felt the need to escalate this issue to your sales department to evaluate our partnership.


ricardovanlaarhoven
PROOP

2 months ago

Hi Railway,

Do you have any news on my questions?


2 months ago

Hello,

I'm sorry but there has been no other reports of this in the month that this thread has been open, unfortunately we don't have the cycles to look into one off non-reproducible reports.


ricardovanlaarhoven
PROOP

4 days ago

Update: Fact-based observations on the 'Infinite Hang'

We have spent more time debugging and want to share what we know for certain to help isolate this.

What we know for sure:

  1. The Hang is Browser-Level: In chrome://discards, the tab shows as visible (the user is looking at it) but the status remains unloaded. The browser hasn't even begun to render the first frame of the page, even though the URL is correct.

  2. Requests do reach the Origin: Our Caddy logs show that the request for index.html is received and served with a 200 OK. This means the initial connection from the browser, through the Railway Edge, to our container is successful.

  3. Inconsistent Reproduction: The issue is extremely elusive. Sometimes the site loads perfectly 20 times in a row. Other times, it hangs 5 times out of 10. Flushing DNS or sockets doesn't consistently trigger it, but it appears most often in "fresh" contexts like Incognito or new browser windows.

  4. Not related to App-Logic or Security Headers:

    • We have disabled Speculation Rules (both via HTML and Headers), but the hang persists.

    • We have tested with and without strict Cross-Origin Isolation headers (COOP/COEP/CORP); the behavior remains identical.

    • Since the browser stays in unloaded state, our frontend JavaScript hasn't even started executing yet.

Conclusion so far: The "handshake" between the browser receiving the data and actually starting to render it is failing. Since Caddy logs a successful send, but the browser stays "unloaded", something is preventing the data from being "finalized" or processed in the browser's rendering engine.


Loading...