Custom domain stuck at 502 despite cert being VALID.
zack-eth
PROOP

a month ago

Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13

Service: runner (1383d914-93f1-498e-a306-43d7192cd956)

Domain: test.runner-x.com

Symptom:

Cert status via GraphQL:

  • certificateStatus: VALID
  • verified: true
  • CN=test.runner-x.com served (Let's Encrypt R13)

But dnsRecords shows currentValue: "" / status: REQUIRES_UPDATE,

even though dig +short CNAME test.runner-x.com returns

uxhlq3l8.up.railway.app from 1.1.1.1 and 8.8.8.8.

Cloudflare proxy is OFF (grey cloud); was briefly ON earlier — seems

Railway's DNS checker may still be holding stale state from then. Tried

redeploy and customDomainUpdate — no effect. Any way to force a DNS

re-check on your side?

Trim if it feels too long — the last paragraph (the ask) is the important bit.

Solved

15 Replies

I'd try removing all associated records from Railway and Cloudflare, waiting for ~10-15 min, then re-add them back.


zack-eth
PROOP

a month ago

Thanks @pepper , I'll give that a try. I'm just concerned about hitting my LE rate limit, as Railway has already issued 3 certs for test.runner-x.com, while debugging this issue. Is there a way to reset the LE rate limit if that happens?


From what I understand, IIRC it's 5 certificates per week.


zack-eth
PROOP

a month ago

Update after full teardown + SSL mode change:

Followed the teardown steps. Domain recreated, records re-added, CF SSL

now Full (strict), proxy off (grey cloud). Railway control plane is all

green:

  • certificateStatus: VALID
  • verified: true
  • dnsRecords[0].currentValue: tyxzlf5y.up.railway.app (status: PROPAGATED)
  • targetPort: 3000

But test.runner-x.com still returns 502 with x-railway-fallback: true:

HTTP/2 502                                                                                                          
server: railway-edge                                                   
x-railway-edge: railway/us-west2                                                                                    
x-railway-fallback: true                                               

Same service on runner-production-1f2f.up.railway.app returns 200 fine

— so the service is healthy, and the edge just doesn't have a backend

registered for the custom hostname.

Looks like an edge/Fastly routing sync issue on your side; nothing

left to try from mine. Can someone flush the hostname → origin map

for test.runner-x.com?

Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13

Service: 1383d914-93f1-498e-a306-43d7192cd956

Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74


zack-eth
PROOP

a month ago

⏺ Update on custom domain issue (continuation of Discord thread on test.runner-x.com).

Short version: app.runner-x.com on a brand-new production environment works end-to-end. test.runner-x.com on the

original environment (which was renamed from "production" to "staging" after the fact) still returns 502 with

x-railway-fallback: true.

Everything Railway-side is green:

  • certificateStatus: VALID
  • verified: true
  • dnsRecords[0].status: PROPAGATED (currentValue matches requiredValue)
  • targetPort: 3000
  • syncStatus: ACTIVE

Service itself is healthy: the staging Railway URL (runner-production-1f2f.up.railway.app) returns 200, so the runner

service on port 3000 works. Only the custom hostname's edge routing falls through.

DNS side is clean: dig confirms the CNAME and TXT records from multiple resolvers. Cloudflare proxy is off (grey

cloud). CF zone SSL is Full (strict). No CAA restrictions.

Things tried with no effect:

  • customDomainUpdate (touched targetPort to same value)
  • serviceInstanceRedeploy on runner
  • Full teardown of DNS + domain + 15 min wait + recreate (per earlier guidance in this thread)

Hypothesis: the edge binding for test.runner-x.com is stale after the environment rename, and nothing client-side

forces a resync. The fact that a freshly-created app.runner-x.com on a never-renamed env works cleanly points at

something specific to this domain's edge state.

Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13

Service: 1383d914-93f1-498e-a306-43d7192cd956

Env: 2d6b0b21-2835-49c4-9e13-751e1275cd55 (renamed production -> staging)

Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74 (test.runner-x.com)

Request ID: OWzKQvHpSAG08J1W0_TJvA (502 response just now)

Can someone force a resync of the hostname → origin map on your edge, or inspect what's wedged?


zack-eth
PROOP

a month ago

@pepper ^ 🙏


Set your Cloudflare SSL to Full (not strict).


Also, enable Universal SSL in Cloudflare if you haven't yet.


( steps 5 and 6)


See if that works.


zack-eth
PROOP

a month ago

@pepper Tried enabling proxy + Full (not strict) + Universal SSL. Same result:

502 with x-railway-fallback: true, served by Railway's edge

(x-railway-cdn-edge: fastly/..., x-railway-edge: railway/us-west2).

Request ID: 0WjNQyxeQ72tfObPxtoGcA

So the 502 reproduces regardless of whether CF is in the path.

Isolating signal: app.runner-x.com, on a different environment but the

same service (same Dockerfile/runner image, same CF zone, same setup

steps), works end-to-end with cert valid and HTTP 200. Only

test.runner-x.com is stuck. test.runner-x.com has never served a 200 —

including after a full teardown-and-recreate per earlier guidance in

this thread. On your API, everything reads as healthy for this domain:

cert VALID, verified=true, DNS record match, targetPort 3000,

syncStatus ACTIVE. But the edge falls back.

Looks like the Fastly hostname→service map was never correctly set for

test.runner-x.com specifically. Can someone inspect or flush the edge

binding for this domain?

Project: 9b60ef7b-cf2f-4cfa-a59d-57add229ab13

Domain: 3aa53157-4913-4675-a2cb-b3973a3cce74 (test.runner-x.com)


a month ago

Have you checked the documentation?


zack-eth
PROOP

a month ago

Resolved — turned out the fix was renaming the auto-generated service

domain (runner-production-1f2f → runner-test). That flushed the edge

binding, and both the renamed URL and test.runner-x.com started working

directly afterward. May be worth adding to internal troubleshooting

runbooks if other customers hit the same x-railway-fallback: true state

with otherwise-healthy API responses.

Thanks for the help!


a month ago

There is no such thing as an edge DNS binding, these words don't mean anything in reality.


a month ago

You had your target port set incorrectly.


Status changed to Solved brody 29 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...