certified.one apex TLS broken — wildcard cert served, SAN mismatch blocking all apex traffic

aspiers

PROOP

a day ago

Hi Railway team,

This is an urgent request regarding our most critical production infrastructure which has been working perfectly for weeks and suddenly stopped working.

The apex custom domain certified.one on our production environment is serving the wrong certificate, causing SAN mismatch errors for all clients hitting https://certified.one/.

Symptoms:

openssl s_client -connect certified.one:443 -servername certified.one returns cert with SAN *.certified.one only (fingerprint

8d:44:7d:bb:63:a1:87:94:bd:b6:dd:e7:89:c8:a3:2f:a2:40:63:ad:6d:23:0e:87:8b:0f:f4:5c:bc:51:d4:78)

Browsers/curl reject with SSL: no alternative certificate subject name matches target hostname 'certified.one'
Wildcard and other subdomains (e.g. auth.certified.one) work fine

Railway state (via GraphQL):

Project: 17980f2b-0913-439f-a53e-472969130b6d (ePDS)
Environment: 2a602bb6-48d5-4a7f-8a1e-ffc184abf406 (production)
Service: 16bd3666-68ae-4537-95fb-ac516e60e9c8 (pds-core)
Apex custom domain id: bcf6a4bd-49a8-4950-93bc-145434641ed9 (certified.one)
Apex certs provisioned (RSA fingerprint e229d5abce4265665a3ec3f12a980eda31affa6946cbbc27fe989f092c8ae11b, ECDSA

8269b24c32bba58a7c6b4141d3c34a68e2b894dc8d8174b1a2044c20f161c058) — but neither is being served at the edge

DNS status for apex shows DNS_RECORD_STATUS_REQUIRES_UPDATE with currentValue: ""

DNS side:

Apex certified.one is a CNAME → t0m9zgd1.up.railway.app configured at Cloudflare (DNS-only / grey cloud)
Cloudflare's CNAME flattening returns A 69.46.46.60 at apex (which matches the resolution of t0m9zgd1.up.railway.app)
Railway docs state CNAME flattening is supported for apex, so this should be a valid setup
We have made no DNS or Railway domain config changes recently; this previously worked

Suspected cause:

Railway's external DNS validator can't see the CNAME (Cloudflare flattens it to A), so the apex domain stays in REQUIRES_UPDATE state and the provisioned apex cert is never bound to the ingress. Requests to apex land on the wildcard ingress (which also resolves to 69.46.46.60) and get the wildcard cert.

We've found a similar report: https://station.railway.com/questions/stale-dns-cache-prevents-custom-domain-v-dfd694d4

Request:

Please could you manually mark the apex domain as validated / force-bind the existing apex cert to the edge? We'd strongly prefer not to remove/re-add the domain (production traffic).

Thanks,

Adam

$20 Bounty

8 Replies

Railway

BOT

a day ago

This thread has been opened as a public bounty so the community can help solve it. The thread and any further activity are now visible to everyone.

Status changed to Open Railway • about 24 hours ago

0x5b62656e5d

MODERATOR

a day ago

Try removing the domain from Railway and add it back after ~10-15 mins. Update DNS records as necessary.

Also, Cloudflare currently has an incident related to certificates so it may or may not be causing this issue. (https://new.cloudflarestatus.com/incidents/j17t8xz91xs0)

aspiers

PROOP

8 hours ago

We did that yesterday and it fixed it, but the problem has already come back again so this is clearly not sustainable. Especially given the limit of 5 new certs per week (if I remember correctly).

aspiers

PROOP

8 hours ago

And this time it didn't work. Why would we have to wait ~10-15 mins before adding?

aspiers

And this time it didn't work. Why would we have to wait ~10-15 mins before adding?

darseen

HOBBYTop 1% Contributor

5 hours ago

It's less about a specific amount of time, and more about TTL expiration and Let's Encrypt rate limits. Waiting a few minutes is better to avoid any caching issues.

aspiers

PROOP

4 hours ago

This time it also affected an entirely different service not on the apex domain, so that disproves the previous theory about flattening of the apex CNAME to an A record being the cause. We've switched to Cloudflare proxying and that works around the broken certificate, but it's very concerning that these things are just randomly failing out of the blue after weeks or months without issue.

aspiers

This time it also affected an entirely different service *not* on the apex domain, so that disproves the previous theory about flattening of the apex CNAME to an A record being the cause. We've switched to Cloudflare proxying and that works around the broken certificate, but it's very concerning that these things are just randomly failing out of the blue after weeks or months without issue.

dev-charles254

PROTop 5% Contributor

4 hours ago

The immediate fix is to temporarily disable Cloudflare proxying for your apex domain certified.one, wait 5–15 minutes for Railway to re‑validate the DNS, then re‑enable proxying. Avoid removing and re‑adding the domain repeatedly—that burns through Let's Encrypt rate limits. Once it’s working again, set up a monitor to watch for future validation failures and consider keeping a fallback A record pointed at Railway’s edge IP.

dev-charles254

The immediate fix is to temporarily disable Cloudflare proxying for your apex domain `certified.one`, wait 5–15 minutes for Railway to re‑validate the DNS, then re‑enable proxying. Avoid removing and re‑adding the domain repeatedly—that burns through Let's Encrypt rate limits. Once it’s working again, set up a monitor to watch for future validation failures and consider keeping a fallback A record pointed at Railway’s edge IP.

aspiers

PROOP

2 hours ago

The problem only manifests when Cloudflare proxying is disabled. In that case certified.one incorrectly uses the certificate for *.certified.one which causes service outages due to TLS validation failures. (We have to have both of these as custom domains for the same service for it to function correctly, due to the nature of the workload.)

Unfortunately we can no longer rely on Railway TLS, given that:

this happened three times within 18 hours (including on non-apex domains)
re-adding the domain didn't even work the second time
Let's Encrypt rate limits will be reached very quickly at this rate
there is apparently no issue acknowledged from Railway's side

From the link shared above, it seems that even relying on Cloudflare is questionable. So we are looking into running our own TLS proxy where we have full control of the certificates. Pretty disappointing, and super strange considering we had months without any issues and didn't change anything recently.

BTW we already had instatus monitoring set up but it inexplicably failed to spot the TLS certificate issue - maybe their monitors cache certificates. I have raised a support request with them separately.

aspiers

The problem only manifests when Cloudflare proxying is disabled. In that case `certified.one` incorrectly uses the certificate for `*.certified.one` which causes service outages due to TLS validation failures. (We have to have both of these as custom domains for the same service for it to function correctly, due to the nature of the workload.) Unfortunately we can no longer rely on Railway TLS, given that: - this happened three times within 18 hours (including on non-apex domains) - re-adding the domain didn't even work the second time - Let's Encrypt rate limits will be reached very quickly at this rate - there is apparently no issue acknowledged from Railway's side From the link shared above, it seems that even relying on Cloudflare is questionable. So we are looking into running our own TLS proxy where we have full control of the certificates. Pretty disappointing, and super strange considering we had months without any issues and didn't change anything recently. BTW we already had instatus monitoring set up but it inexplicably failed to spot the TLS certificate issue - maybe their monitors cache certificates. I have raised a support request with them separately.

dev-charles254

PROTop 5% Contributor

an hour ago

If the problem only appears when Cloudflare proxying is disabled, then Railway’s edge is indeed serving the wrong certificate for the apex, if running your own TLS proxy where you have full control of the certificates its better

Welcome!