5 days ago
Starting my own thread because I do not wish to hijack another (tho it seems resolved), but see here for the initial context: https://discord.com/channels/713503345364697088/1513708831941726318/1514060548374003773
To elaborate, we're seeing really inflated request times vs what our application (and Railway's "upstreamRqDuration") report.
A few examples from our production environment (this one does have Cloudflare Proxy in the loop, but we're seeing the same thing on dev which does not use this):
(Request ID | upstreamRqDuration | Browser TTFB | Approx. location)
# Endpoint 1 - fastest (generally consistent response size, ~1.5kB)
c45-3s78R-mn7gGyg4a9AQ | 2ms | 205ms | Edinburgh, Scotland
Q8eUoFp_Q2OhUka5o3UVLg | 2ms | 356ms | Orlando, Florida
# Endpoint 2 - medium (slightly varying response size, ~3-4kB)
AAVjl-aKSTK9BJIvO8poTA | 336ms | 464ms | Edinburgh, Scotland
Urq5uX1HQg2qJAN1BT7zVQ | 74ms | 278ms | Orlando, Florida
# Endpoint 3 - slowest (with (usually large - 84kB) varying response sizes per user)
pKFK7rTJR_q6Y8euO8poTA | 1136ms | 1.28s | Edinburgh, Scotland <-- internal app state cache miss, but the duration mismatch is still there
rut1V0ryTn-LHXXpH4GxDA | 118ms | 364ms | Orlando, FloridaWhile I have a limited location samples, all of our users are reporting our application feeling slow, at least for the last couple weeks, but potentially since we migrated onto Railway (we did initially have a perf decrease coming from dedicated hosting, but we expected this and have optimized accordingly).
Our stack is PHP, running on a tuned shinsenter/php:8.5-nginx-fpm instance.
I don't want to jump the gun and say that this is definitely on Railway's end, but running requests in the container itself shows that our application is fast - and Railway's own HTTP Metrics support this, as they are around what we would expect. The issue seems to lie somewhere downstream (upstream?) of that, i.e. Proxy or Edge.
Would appreciate further help in diagnosing this, and hopefully reaching a resolution.
106 Replies
5 days ago
I'm in Edinburgh, Scotland (UK). https://railway.com/.railway/cdn-trace shows:
v=hikari:0.1.2:1305a1583dc517b7e2603638b5710b21
r=production-hikari-dpber-1
pop=ber1
node=bpxb
ts=1781067039462
hint=Railway is coming to a city near you5 days ago
Hi @Thaumanovic, could you provide the information I asked from you in the previous thread? 🙂
5 days ago
See above. 🙂 Deployment is us-east
5 days ago
I have a user in Newark, close to Ashburn. They report:
v=hikari:0.1.2:1305a1583dc517b7e2603638b5710b21
r=production-hikari-dpjfk-1
pop=jfk1
node=57w5
ts=1781067324746
hint=Railway is coming to a city near you5 days ago
They also experience this latency.
5 days ago
Are you also experiencing the latency personally?
5 days ago
Yes, all of our users appear to be
5 days ago
Approximately when did this start? I’m expecting this is an application issue, but it could be a side effect of new HTTP/2 behavior on our edge.
5 days ago
It's hard to say. Our app is old, so we expected signficant performance impact when we migrated ~3-4 weeks ago. We've done a heavy optimization pass since then, and our own profiling shows good improvement, but those gains don't seem to be realized because of this issue (wherever it may be).
5 days ago
Understood, well we only moved applications to our new edge starting mid-last week, so I don’t think that’s related. Unfortunately I believe this is an application issue. upstreamReqDuration is the time it takes for your application to answer our origin proxies.
5 days ago
That's what I understand it to be, but these figures:
(upstreamRqDuration | Browser TTFB)
2ms | 205ms
2ms | 356msAre rather far apart.
Obviously I'd expect some latency from distance and overheads, but I would not expect that much, nor would I expect it to be as inconsistent as it is.
5 days ago
I have profiled that endpoint, and it shows a near-consistent ~1.5ms request time (it just renders a custom SVG). The fact that upstreamRqDuration is similar rules out that it's an nginx<->fpm issue (we'd maybe see that under heavy load), so I am not convinced it's an application issue. 🙁
5 days ago
Do you also experience high latency with this endpoint? For the user in Orlando, it’d be ideal to get the Railway CDN trace information from them.
5 days ago
Let me hit it just now for a fresh figure
5 days ago
(OL user is showing away, probably sleep now - but I can follow up with that)
5 days ago
Attachments
5 days ago
Attachments
5 days ago
CHg5SuUHSQKgPMGzg4a9AQ
5 days ago
(Not sure the req ID is any use to you, but there is is anyways)
5 days ago
That was prod, so it has CF proxy, but even on dev the figures are only marginally smaller.
5 days ago
Does upstreamRqDuration measure a complete response, or is it similar to TTFB?
5 days ago
If the latter, then the issue could lie in our setup (requests not terminating properly, maybe?), but if the former then I am certain it's not our application.
5 days ago
We do use gzip, so it's chunked transfers without a Content-Length header, but this is quite a common thing so I'd be surprised if that were somehow confusing the proxy.
5 days ago
Where are you based? (Country)
5 days ago
Scotland
5 days ago
Which pop do you hit on https://railway.com/.railway/cdn-trace ?
5 days ago
pop=ber1
node=mmj8
5 days ago
Hmm Berlin, ideally you should hit the London pop. Could you provide the name of your ISP and a traceroute to 69.46.46.1 if possible?
5 days ago
> Your ISP is
> Vodafone Limited
tracert 69.46.46.1
Tracing route to 69.46.46.1 over a maximum of 30 hops
1 <1 ms <1 ms <1 ms 192.168.0.1
2 9 ms 8 ms 8 ms 90.247.128.1
3 19 ms 18 ms 18 ms 63.130.172.45
4 20 ms 20 ms 20 ms ae1-0.lon10.core-backbone.com [195.66.224.238]
5 37 ms 37 ms 37 ms ae5-2089.ber10.core-backbone.com [81.95.9.229]
6 41 ms 41 ms 41 ms coreb-ber.cdn77.com [138.199.1.12]
7 42 ms 41 ms 41 ms vl221.ber-ipb-dist-1.cdn77.com [79.127.195.227]
8 41 ms 41 ms 41 ms 69.46.46.15 days ago
Cool thank you, I’ll look into fixing this routing today. When did you last deploy the nginx server btw? We made some further improvements to internal routing earlier this week but it requires a redeploy.
5 days ago
10 hours ago
5 days ago
May I also ask why you’re using nginx on top of your application?
5 days ago
PHP usually requires something in front of it (nginx, apache, etc) to dish out requests. It's all a single container: https://github.com/shinsenter/php (nginx-fpm flavour)
5 days ago
Ideally we'd go for something like RoadRunner, but our app architecture does not support that at this time.
5 days ago
Would it be possible to get the URL to the SVG you’re talking about that has high latency so I can verify end to end?
5 days ago
Err, 1 sec. Might be auth-walled, let me check.
5 days ago
Is it not. Can I DM the URL to you?
5 days ago
Of course 🙂
5 days ago
Hi, could you check https://railway.com/.railway/cdn-trace again?
5 days ago
pop=cdg1
node=e9jw
5 days ago
So it's shifted, just not to the right place 🥲
5 days ago
cdg = france?
5 days ago
Could you traceroute again?
5 days ago
Yeah
5 days ago
tracert 69.46.46.1
Tracing route to 69.46.46.1 over a maximum of 30 hops
1 <1 ms <1 ms <1 ms 192.168.0.1
2 7 ms 7 ms 7 ms 90.247.128.1
3 * 15 ms 15 ms 63.130.172.35
4 15 ms 14 ms 14 ms ae5-100-xcr1.man.cw.net [195.89.96.113]
5 20 ms 20 ms 20 ms ae31-xcr1.lns.cw.net [195.2.9.97]
6 20 ms 20 ms 20 ms 81.52.179.111
7 27 ms 27 ms 27 ms 193.251.128.71
8 27 ms 26 ms 26 ms 81.52.186.230
9 27 ms 26 ms 26 ms vl221.par-itx5-dist-1.cdn77.com [79.127.195.35]
10 27 ms 27 ms 26 ms 69.46.46.1
Trace complete.5 days ago
interesting, not sure why vodafone's doing that.. our London POP has a PNI (interconnect) directly with their network in London
5 days ago
Well
5 days ago
The answer to that is probably that it's Vodafone
5 days ago
I'd be on an 8 gig connection with YouFibre if I could, but surprisingly for a building built in the last 10 years, we have no fibre. Good old ADSL.
5 days ago
Do you speak to someone at Vodafone, or is it the CDN provider, or can I harass their engineers given I am unable to subscribe from their marketting SMS and emails?
5 days ago
Our Orlando user sees pop=atl1 (Verizon)
5 days ago
Not sure what's up with this one, but hitting my service via CF still gives me the Berlin POP - directly is CDG.
Just adding a note here - I am based in Seattle, and my CF traffic seems to route through Berlin as well. (direct is sjc1)
quMwwRdrQOSXz9KNU79b0g | ingress=ber1 (Berlin) | ttfb=341ms
By2JgMnuRjiNRzALBhdwDg | ingress=ber1 (Berlin) | ttfb=340ms
otdQeHxTRe-AXN0QyCLmYg | ingress=ber1 (Berlin) | ttfb=341ms
4 days ago
Should be fixed
4 days ago
I contacted Vodafone but can't make any promises. I'll let you know if they get it resolved.
4 days ago
CDG isn't too far from London in the grand scheme of things. The latency is still great.
3 days ago
I'm experiencing a similar thing, and this isn't an application issue. There's high latency that sometimes spikes to 10+ seconds. Everything was normal on May 23-27, but this has been ongoing since May 28.
Attachments
3 days ago
Red shows where the service went unresponsive. The spaces without red are from May 23-27, when everything was fine.
Attachments
3 days ago
May I ask what you use the Cloudflare proxy for? It'd be ideal for us if users could disable the CF proxy, as they keep messing with routing to our edge.
3 days ago
Other than CDN, bot prevention and redirect rules in our case.
3 days ago
Hmm ok, we have CDN + bot prevention now. We'll work on "page-rules"-esque features soon.
3 days ago
CDN:
WAF/bot protection:
Analytics, DDoS protection and custom security rules (even if the server is not "under attack", some bot or ASN can have a real impact on the billing)
3 days ago
We can maybe move this to our application itself to be honest. Was just more convenient to punt it to Cloudflare so we didn't have to handle the hits.
3 days ago
We did it with 301 redirects so I'm hoping all of the crawlers and stuff will stop hitting the old endpoints
Could you tell me whether someone is already looking into this issue, or if I should open a new thread?
To be honest, I’m not sure whether this is a Railway issue or a Cloudflare issue. A few months ago, similar routing problems showed an X-Railway-Edge header from another region, but this time:
- the region seems correct (europe-west4)
- I’m getting an average response time of around 500 ms from Paris
- Railway’s metrics show an unexpectedly low response time (p90 used to be ~600 ms and is now ~30 ms)
Attachments
3 days ago
The issue is that Cloudflare in Paris is routing to our POP in Tokyo.
3 days ago
It's not necessarily a Railway issue, as we do not operate the Cloudflare network or proxy and would generally advise customers to turn it off to resolve the issue.
3 days ago
We still of course would like to support Cloudflare's network, so we will work to resolve this issue, however they have increasingly become harder to support as of recent.
Alright, could you clarify what you mean by “we will work on it”?
Is the issue currently being addressed, or should I open a ticket on Cloudflare’s side?
Let me know if there’s anything I can do on my end to help speed up the fix.
3 days ago
What about cases where are not running CF Proxy? Our dev environment has this disabled (tho still uses CF DNS) and we still see a large mismatch between our actual response times and TTFB.
3 days ago
I would need more information. Which POP are you being routed to? If it's a web URL, what is the breakdown to TTFB in your browser DevTools? What is the discrepancy between the two?
3 days ago
This would be helpful.
Attachments
3 days ago
Can sort that out just now, gimmie 10
3 days ago
I generally try to fix these routes every couple of days, but they keep breaking. So I can't make any promises. If you could make a ticket with Cloudflare referencing the traceroute, that would be ideal and might escalate it on their side too.
3 days ago
That being said, I think it should be fixed in most of their colos now.
3 days ago
Attachments
Thank you @Phineas and good luck with these bad boys, I hope it'll get better
3 days ago
No worries and thank you!
3 days ago
Traceroute:
Tracing route to nejrnq6f.up.railway.app [69.46.46.60]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 192.168.0.1
2 8 ms 9 ms 8 ms 90.247.128.1
3 * * * Request timed out.
4 17 ms 15 ms 16 ms ae5-100-xcr1.man.cw.net [195.89.96.113]
5 21 ms 20 ms 20 ms ae31-xcr1.lns.cw.net [195.2.9.97]
6 * * * Request timed out.
7 27 ms 28 ms 28 ms 193.251.128.71
8 27 ms 27 ms 27 ms 81.52.186.230
9 28 ms 27 ms 27 ms vl221.par-itx5-dist-2.cdn77.com [79.127.195.36]
10 27 ms 27 ms 27 ms 69.46.46.603 days ago
Browser timing:
Attachments
3 days ago
Real request time in our app:
Attachments
3 days ago
Thanks. Could I see the response headers?
3 days ago
cache-control
no-store, no-cache, must-revalidate
content-encoding
gzip
content-type
text/html; charset=UTF-8
date
Fri, 12 Jun 2026 14:59:13 GMT
expires
Thu, 19 Nov 1981 08:52:00 GMT
p3p
CP="NOI DSP CURa ADMa DEVa TAIa OUR BUS IND PHY UNI COM NAV DEM"
permissions-policy
interest-cohort=()
pragma
no-cache
referrer-policy
no-referrer-when-downgrade
server
railway-hikari
vary
Accept-Encoding
x-content-type-options
nosniff
x-hikari-trace
cdg1.e9jw
x-railway-edge
railway/us-east4-eqdc4a
x-railway-request-id
8rjP5QazTCOoIfwJo3UVLg
x-xss-protection
1; mode=block3 days ago
Sorry for the formatting, DevTools butchers it
3 days ago
Thanks, and sorry which Railway service was this for? (You can link me to it on the dashboard)
3 days ago
3 days ago
(that do?)
3 days ago
Environment ID: e70b963a-417f-4e0b-8289-cdd95800254c
3 days ago
Could you visit https://us-east4-eqdc4a-production.up.railway.app/ (refresh twice) and then show me the Timing stats?
3 days ago
Attachments
3 days ago
Timing from that last one:
Attachments
3 days ago
ReqId: fyWowW6HTxyHG7pj8u2xcg
3 days ago
Seeing: cdg1.8vsn
3 days ago
Thanks. That seems generally correct for your geography (considering the cdg reroute)
3 days ago
The extra 100ms you see is likely coming from your application. What is the actual response from that route?
3 days ago
On our app?
3 days ago
Yeah, what does this (loader.php) actually respond with - what's the content?
3 days ago
Just a bunch of HTML, that particular one was 1.6kB
the issue is back
Attachments
3 days ago
Yeah I brought the issue back so that their team can debug, apologies. https://x.com/phineyes/status/2065546050719604848?s=46
2 days ago
Thanks for the update - appreciate all your efforts on this! ❤️