23 days ago
Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Service: runner (1383d914-93f1-498e-a306-43d7192cd956)
Domain: test.runner-x.com
Symptom:
https://test.runner-x.com/ → 502 (x-railway-fallback: true)
Service URL https://runner-production-1f2f.up.railway.app/ → 200 ✅
targetPort: 3000 (matches Puma)
Cert status via GraphQL:
certificateStatus: VALID
verified: true
CN=test.runner-x.com served (Let's Encrypt R13)
But dnsRecords shows currentValue: "" / status: REQUIRES_UPDATE,
even thoughdig +short CNAME test.runner-x.comreturnsuxhlq3l8.up.railway.appfrom 1.1.1.1 and 8.8.8.8.Cloudflare proxy is OFF (grey cloud); was briefly ON earlier — seems
Railway's DNS checker may still be holding stale state from then. Tried
redeploy and customDomainUpdate — no effect. Any way to force a DNS
re-check on your side?Trim if it feels too long — the last paragraph (the ask) is the important bit.
15 Replies
23 days ago
I'd try removing all associated records from Railway and Cloudflare, waiting for ~10-15 min, then re-add them back.
Thanks @pepper , I'll give that a try. I'm just concerned about hitting my LE rate limit, as Railway has already issued 3 certs for test.runner-x.com, while debugging this issue. Is there a way to reset the LE rate limit if that happens?
23 days ago
From what I understand, IIRC it's 5 certificates per week.
Update after full teardown + SSL mode change:
Followed the teardown steps. Domain recreated, records re-added, CF SSL
now Full (strict), proxy off (grey cloud). Railway control plane is all
green:
certificateStatus: VALID
verified: true
dnsRecords[0].currentValue: tyxzlf5y.up.railway.app (status: PROPAGATED)
targetPort: 3000
But test.runner-x.com still returns 502 with
x-railway-fallback: true:HTTP/2 502
server: railway-edge
x-railway-edge: railway/us-west2
x-railway-fallback: trueSame service on runner-production-1f2f.up.railway.app returns 200 fine
— so the service is healthy, and the edge just doesn't have a backend
registered for the custom hostname.Looks like an edge/Fastly routing sync issue on your side; nothing
left to try from mine. Can someone flush the hostname → origin map
for test.runner-x.com?Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Service: 1383d914-93f1-498e-a306-43d7192cd956
Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74
⏺ Update on custom domain issue (continuation of Discord thread on test.runner-x.com).
Short version: app.runner-x.com on a brand-new production environment works end-to-end. test.runner-x.com on the
original environment (which was renamed from "production" to "staging" after the fact) still returns 502 with
x-railway-fallback: true.
Everything Railway-side is green:
certificateStatus: VALID
verified: true
dnsRecords[0].status: PROPAGATED (currentValue matches requiredValue)
targetPort: 3000
syncStatus: ACTIVE
Service itself is healthy: the staging Railway URL (runner-production-1f2f.up.railway.app) returns 200, so the runner
service on port 3000 works. Only the custom hostname's edge routing falls through.DNS side is clean: dig confirms the CNAME and TXT records from multiple resolvers. Cloudflare proxy is off (grey
cloud). CF zone SSL is Full (strict). No CAA restrictions.Things tried with no effect:
customDomainUpdate (touched targetPort to same value)
serviceInstanceRedeploy on runner
Full teardown of DNS + domain + 15 min wait + recreate (per earlier guidance in this thread)
Hypothesis: the edge binding for test.runner-x.com is stale after the environment rename, and nothing client-side
forces a resync. The fact that a freshly-created app.runner-x.com on a never-renamed env works cleanly points at
something specific to this domain's edge state.Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Service: 1383d914-93f1-498e-a306-43d7192cd956
Env: 2d6b0b21-2835-49c4-9e13-751e1275cd55 (renamed production -> staging)
Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74 (test.runner-x.com)
Request ID: OWzKQvHpSAG08J1W0_TJvA (502 response just now)Can someone force a resync of the hostname → origin map on your edge, or inspect what's wedged?
22 days ago
Set your Cloudflare SSL to Full (not strict).
22 days ago
Also, enable Universal SSL in Cloudflare if you haven't yet.
22 days ago
( steps 5 and 6)
22 days ago
See if that works.
@pepper Tried enabling proxy + Full (not strict) + Universal SSL. Same result:
502 with x-railway-fallback: true, served by Railway's edge
(x-railway-cdn-edge: fastly/…, x-railway-edge: railway/us-west2).
Request ID: 0WjNQyxeQ72tfObPxtoGcA
So the 502 reproduces regardless of whether CF is in the path.
Isolating signal: app.runner-x.com, on a different environment but the
same service (same Dockerfile/runner image, same CF zone, same setup
steps), works end-to-end with cert valid and HTTP 200. Only
test.runner-x.com is stuck. test.runner-x.com has never served a 200 —
including after a full teardown-and-recreate per earlier guidance in
this thread. On your API, everything reads as healthy for this domain:
cert VALID, verified=true, DNS record match, targetPort 3000,
syncStatus ACTIVE. But the edge falls back.
Looks like the Fastly hostname→service map was never correctly set for
test.runner-x.com specifically. Can someone inspect or flush the edge
binding for this domain?
Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74 (test.runner-x.com)
21 days ago
Have you checked the documentation?
Resolved — turned out the fix was renaming the auto-generated service
domain (runner-production-1f2f → runner-test). That flushed the edge
binding, and both the renamed URL and test.runner-x.com started working
directly afterward. May be worth adding to internal troubleshooting
runbooks if other customers hit the same x-railway-fallback: true state
with otherwise-healthy API responses.
Thanks for the help!
21 days ago
There is no such thing as an edge DNS binding, these words don't mean anything in reality.
21 days ago
You had your target port set incorrectly.
Status changed to Solved brody • 20 days ago