a month ago
Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Service: runner (1383d914-93f1-498e-a306-43d7192cd956)
Domain: test.runner-x.com
Symptom:
- https://test.runner-x.com/ → 502 (x-railway-fallback: true)
- Service URL https://runner-production-1f2f.up.railway.app/ → 200 ✅
- targetPort: 3000 (matches Puma)
Cert status via GraphQL:
- certificateStatus: VALID
- verified: true
- CN=test.runner-x.com served (Let's Encrypt R13)
But dnsRecords shows currentValue: "" / status: REQUIRES_UPDATE,
even though dig +short CNAME test.runner-x.com returns
uxhlq3l8.up.railway.app from 1.1.1.1 and 8.8.8.8.
Cloudflare proxy is OFF (grey cloud); was briefly ON earlier — seems
Railway's DNS checker may still be holding stale state from then. Tried
redeploy and customDomainUpdate — no effect. Any way to force a DNS
re-check on your side?
Trim if it feels too long — the last paragraph (the ask) is the important bit.
15 Replies
a month ago
I'd try removing all associated records from Railway and Cloudflare, waiting for ~10-15 min, then re-add them back.
Thanks @pepper , I'll give that a try. I'm just concerned about hitting my LE rate limit, as Railway has already issued 3 certs for test.runner-x.com, while debugging this issue. Is there a way to reset the LE rate limit if that happens?
a month ago
From what I understand, IIRC it's 5 certificates per week.
Update after full teardown + SSL mode change:
Followed the teardown steps. Domain recreated, records re-added, CF SSL
now Full (strict), proxy off (grey cloud). Railway control plane is all
green:
- certificateStatus: VALID
- verified: true
- dnsRecords[0].currentValue: tyxzlf5y.up.railway.app (status: PROPAGATED)
- targetPort: 3000
But test.runner-x.com still returns 502 with x-railway-fallback: true:
HTTP/2 502
server: railway-edge
x-railway-edge: railway/us-west2
x-railway-fallback: true Same service on runner-production-1f2f.up.railway.app returns 200 fine
— so the service is healthy, and the edge just doesn't have a backend
registered for the custom hostname.
Looks like an edge/Fastly routing sync issue on your side; nothing
left to try from mine. Can someone flush the hostname → origin map
for test.runner-x.com?
Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Service: 1383d914-93f1-498e-a306-43d7192cd956
Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74
⏺ Update on custom domain issue (continuation of Discord thread on test.runner-x.com).
Short version: app.runner-x.com on a brand-new production environment works end-to-end. test.runner-x.com on the
original environment (which was renamed from "production" to "staging" after the fact) still returns 502 with
x-railway-fallback: true.
Everything Railway-side is green:
- certificateStatus: VALID
- verified: true
- dnsRecords[0].status: PROPAGATED (currentValue matches requiredValue)
- targetPort: 3000
- syncStatus: ACTIVE
Service itself is healthy: the staging Railway URL (runner-production-1f2f.up.railway.app) returns 200, so the runner
service on port 3000 works. Only the custom hostname's edge routing falls through.
DNS side is clean: dig confirms the CNAME and TXT records from multiple resolvers. Cloudflare proxy is off (grey
cloud). CF zone SSL is Full (strict). No CAA restrictions.
Things tried with no effect:
- customDomainUpdate (touched targetPort to same value)
- serviceInstanceRedeploy on runner
- Full teardown of DNS + domain + 15 min wait + recreate (per earlier guidance in this thread)
Hypothesis: the edge binding for test.runner-x.com is stale after the environment rename, and nothing client-side
forces a resync. The fact that a freshly-created app.runner-x.com on a never-renamed env works cleanly points at
something specific to this domain's edge state.
Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Service: 1383d914-93f1-498e-a306-43d7192cd956
Env: 2d6b0b21-2835-49c4-9e13-751e1275cd55 (renamed production -> staging)
Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74 (test.runner-x.com)
Request ID: OWzKQvHpSAG08J1W0_TJvA (502 response just now)
Can someone force a resync of the hostname → origin map on your edge, or inspect what's wedged?
a month ago
Set your Cloudflare SSL to Full (not strict).
a month ago
Also, enable Universal SSL in Cloudflare if you haven't yet.
a month ago
( steps 5 and 6)
a month ago
See if that works.
@pepper Tried enabling proxy + Full (not strict) + Universal SSL. Same result:
502 with x-railway-fallback: true, served by Railway's edge
(x-railway-cdn-edge: fastly/..., x-railway-edge: railway/us-west2).
Request ID: 0WjNQyxeQ72tfObPxtoGcA
So the 502 reproduces regardless of whether CF is in the path.
Isolating signal: app.runner-x.com, on a different environment but the
same service (same Dockerfile/runner image, same CF zone, same setup
steps), works end-to-end with cert valid and HTTP 200. Only
test.runner-x.com is stuck. test.runner-x.com has never served a 200 —
including after a full teardown-and-recreate per earlier guidance in
this thread. On your API, everything reads as healthy for this domain:
cert VALID, verified=true, DNS record match, targetPort 3000,
syncStatus ACTIVE. But the edge falls back.
Looks like the Fastly hostname→service map was never correctly set for
test.runner-x.com specifically. Can someone inspect or flush the edge
binding for this domain?
Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74 (test.runner-x.com)
a month ago
Have you checked the documentation?
Resolved — turned out the fix was renaming the auto-generated service
domain (runner-production-1f2f → runner-test). That flushed the edge
binding, and both the renamed URL and test.runner-x.com started working
directly afterward. May be worth adding to internal troubleshooting
runbooks if other customers hit the same x-railway-fallback: true state
with otherwise-healthy API responses.
Thanks for the help!
a month ago
There is no such thing as an edge DNS binding, these words don't mean anything in reality.
a month ago
You had your target port set incorrectly.
Status changed to Solved brody • 29 days ago