Custom domain stuck at 502 despite cert being VALID.
zack-eth
PROOP

23 days ago

Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Service: runner (1383d914-93f1-498e-a306-43d7192cd956)
Domain: test.runner-x.com

Symptom:

  • https://test.runner-x.com/ → 502 (x-railway-fallback: true)

  • Service URL https://runner-production-1f2f.up.railway.app/ → 200 ✅

  • targetPort: 3000 (matches Puma)

    Cert status via GraphQL:

  • certificateStatus: VALID

  • verified: true

  • CN=test.runner-x.com served (Let's Encrypt R13)

    But dnsRecords shows currentValue: "" / status: REQUIRES_UPDATE,
    even though dig +short CNAME test.runner-x.com returns
    uxhlq3l8.up.railway.app from 1.1.1.1 and 8.8.8.8.

    Cloudflare proxy is OFF (grey cloud); was briefly ON earlier — seems
    Railway's DNS checker may still be holding stale state from then. Tried
    redeploy and customDomainUpdate — no effect. Any way to force a DNS
    re-check on your side?

    Trim if it feels too long — the last paragraph (the ask) is the important bit.

Solved

15 Replies

I'd try removing all associated records from Railway and Cloudflare, waiting for ~10-15 min, then re-add them back.


zack-eth
PROOP

23 days ago

Thanks @pepper , I'll give that a try. I'm just concerned about hitting my LE rate limit, as Railway has already issued 3 certs for test.runner-x.com, while debugging this issue. Is there a way to reset the LE rate limit if that happens?


From what I understand, IIRC it's 5 certificates per week.


zack-eth
PROOP

23 days ago

Update after full teardown + SSL mode change:

Followed the teardown steps. Domain recreated, records re-added, CF SSL
now Full (strict), proxy off (grey cloud). Railway control plane is all
green:

  • certificateStatus: VALID

  • verified: true

  • dnsRecords[0].currentValue: tyxzlf5y.up.railway.app (status: PROPAGATED)

  • targetPort: 3000

    But test.runner-x.com still returns 502 with x-railway-fallback: true:

    HTTP/2 502
    server: railway-edge
    x-railway-edge: railway/us-west2
    x-railway-fallback: true

    Same service on runner-production-1f2f.up.railway.app returns 200 fine
    — so the service is healthy, and the edge just doesn't have a backend
    registered for the custom hostname.

    Looks like an edge/Fastly routing sync issue on your side; nothing
    left to try from mine. Can someone flush the hostname → origin map
    for test.runner-x.com?

    Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
    Service: 1383d914-93f1-498e-a306-43d7192cd956
    Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74


zack-eth
PROOP

22 days ago

⏺ Update on custom domain issue (continuation of Discord thread on test.runner-x.com).

Short version: app.runner-x.com on a brand-new production environment works end-to-end. test.runner-x.com on the
original environment (which was renamed from "production" to "staging" after the fact) still returns 502 with
x-railway-fallback: true.

Everything Railway-side is green:

  • certificateStatus: VALID

  • verified: true

  • dnsRecords[0].status: PROPAGATED (currentValue matches requiredValue)

  • targetPort: 3000

  • syncStatus: ACTIVE

    Service itself is healthy: the staging Railway URL (runner-production-1f2f.up.railway.app) returns 200, so the runner
    service on port 3000 works. Only the custom hostname's edge routing falls through.

    DNS side is clean: dig confirms the CNAME and TXT records from multiple resolvers. Cloudflare proxy is off (grey
    cloud). CF zone SSL is Full (strict). No CAA restrictions.

    Things tried with no effect:

  • customDomainUpdate (touched targetPort to same value)

  • serviceInstanceRedeploy on runner

  • Full teardown of DNS + domain + 15 min wait + recreate (per earlier guidance in this thread)

    Hypothesis: the edge binding for test.runner-x.com is stale after the environment rename, and nothing client-side
    forces a resync. The fact that a freshly-created app.runner-x.com on a never-renamed env works cleanly points at
    something specific to this domain's edge state.

    Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
    Service: 1383d914-93f1-498e-a306-43d7192cd956
    Env: 2d6b0b21-2835-49c4-9e13-751e1275cd55 (renamed production -> staging)
    Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74 (test.runner-x.com)
    Request ID: OWzKQvHpSAG08J1W0_TJvA (502 response just now)

    Can someone force a resync of the hostname → origin map on your edge, or inspect what's wedged?


zack-eth
PROOP

22 days ago

@pepper ^ 🙏


Set your Cloudflare SSL to Full (not strict).


Also, enable Universal SSL in Cloudflare if you haven't yet.


( steps 5 and 6)


See if that works.


zack-eth
PROOP

21 days ago

@pepper Tried enabling proxy + Full (not strict) + Universal SSL. Same result:
502 with x-railway-fallback: true, served by Railway's edge
(x-railway-cdn-edge: fastly/…, x-railway-edge: railway/us-west2).
Request ID: 0WjNQyxeQ72tfObPxtoGcA

So the 502 reproduces regardless of whether CF is in the path.

Isolating signal: app.runner-x.com, on a different environment but the
same service (same Dockerfile/runner image, same CF zone, same setup
steps), works end-to-end with cert valid and HTTP 200. Only
test.runner-x.com is stuck. test.runner-x.com has never served a 200 —
including after a full teardown-and-recreate per earlier guidance in
this thread. On your API, everything reads as healthy for this domain:
cert VALID, verified=true, DNS record match, targetPort 3000,
syncStatus ACTIVE. But the edge falls back.

Looks like the Fastly hostname→service map was never correctly set for
test.runner-x.com specifically. Can someone inspect or flush the edge
binding for this domain?

Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13
Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74 (test.runner-x.com)


21 days ago

Have you checked the documentation?


zack-eth
PROOP

21 days ago

Resolved — turned out the fix was renaming the auto-generated service
domain (runner-production-1f2f → runner-test). That flushed the edge
binding, and both the renamed URL and test.runner-x.com started working
directly afterward. May be worth adding to internal troubleshooting
runbooks if other customers hit the same x-railway-fallback: true state
with otherwise-healthy API responses.

Thanks for the help!


21 days ago

There is no such thing as an edge DNS binding, these words don't mean anything in reality.


21 days ago

You had your target port set incorrectly.


Status changed to Solved brody 20 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...