11 days ago
Hi Railway team,
I can't select the correct project from the dropdown, so I've chosen N/A. I know this is incorrect, but that's the only selection available to me.
Four custom domains across two of my services have been stuck on "Certificate Authority is validating challenges" since 2026-05-11 17:01. All four were re-issued at roughly the same time and none have moved off that state. They all sit under the same registered domain wwp-group.com).
Affected hostnames (project 9902e623-ac35-4c19-867f-a868be7dd43b/wwp-henry backend):
| Hostname | Service | Railway CNAME target |
|---|---|---|
| henry-api-staging.wwp-group.com | backend (staging env) | 9ed73vzc.up.railway.app |
| henry-mcp-staging.wwp-group.com | backend (staging env) | abrrk5hr.up.railway.app |
| henry-api.wwp-group.com | backend (production env) | l87rfzkd.up.railway.app |
| henry-mcp.wwp-group.com | backend (production env) | j3rvv0so.up.railway.app |
What I verified externally before opening this ticket (all clean):
1. DNS — every hostname resolves correctly to its CNAME target and on to Railway edge IPs 66.33.22.x). The Railway-required TXT records are in place with the same value as the CNAME target.
2. CAA records — none set on wwp-group.com, so Let's Encrypt is not blocked.
3. DNSSEC — not enabled (no DS records at the parent), so there is no DNSSEC misconfiguration.
4. No Cloudflare proxy / no upstream CDN — nameservers are ns-{b,c,d}.galea.at. Curl returns Server: railway-edge directly with no cf-ray header.
5. Not rate-limited by Let's Encrypt — crt.sh shows only 2 certs ever issued for the registered domain, both from January 2026, and zero issuance activity in the last 30 days. The 50-certs-per-week and 5-duplicates-per-week limits do not apply.
6. HTTP-01 reachability — http://<hostname>/.well-known/acme-challenge/probe reaches Railway edge (HTTP 200/404 from the app, no network-level rejection).
So none of the user-side causes listed in docs.railway.com/networking/troubleshooting/ssl apply.
What I've tried:
- First issuance attempt failed (no detailed error surfaced in the dashboard beyond the generic failure state).
- Re-issued once. Since then all four have been stuck on "Certificate Authority is validating challenges" with no further progress.
- In the meantime, all four have failed.
- I have not deleted-and-re-added the domains (per Railway's own guidance to avoid burning the LE rate limit).
What I'm asking:
Could you check the LE issuance pipeline for these four hostnames on your side? The simultaneous stickiness across four independent hostnames under different Railway services suggests something internal to Railway's challenge worker rather than anything DNS- or origin-side. If a manual re-queue or pipeline reset is needed, please trigger that — I'd rather not keep clicking re-issue and risk hitting LE rate limits.
Happy to provide additional info (project/service IDs, exact retry timestamps) if useful.
Thanks!
Peter
3 Replies
11 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 11 days ago
11 days ago
Following up. With DNS staleness now ruled out, here is what Railway's own dashboard reports for the staging hostnames (pulled from the network response on the domain panel):
Project IDs:
- Project: 9902e623-ac35-4c19-867f-a868be7dd43b
- Service: 0884d850-cb80-45de-b3f6-bf4afa584a8c
- Edge: edge-500b32d4e9aaa4d3ff91d6ebb54285cd
- Staging environment: c0074b63-809d-4e50-8e08-f627ba15f79c
*henry-mcp-staging.wwp-group.com** (custom domain id 3376184f-bded-45fa-8093-f623e1e5af18):
```
certificateStatus: CERTIFICATE_STATUS_TYPE_ISSUE_FAILED
certificateStatusDetailed: CERTIFICATE_STATUS_TYPE_DETAILED_FAILED
certificateErrorType: CERTIFICATE_ERROR_TYPE_INTERNAL
certificateErrorMessage: "An internal error occurred. Please retry or contact support."
certificateRetryable: true
dnsRecords[0].status: DNS_RECORD_STATUS_PROPAGATED
verified: true
```
*henry-api-staging.wwp-group.com** (custom domain id eb7525b4-f992-4fdd-91a0-b82dee02a90e):
```
certificateStatus: CERTIFICATE_STATUS_TYPE_ISSUING
certificateStatusDetailed: CERTIFICATE_STATUS_TYPE_DETAILED_POLLING_AUTHORIZATIONS
certificateErrorMessage: null
dnsRecords[0].status: DNS_RECORD_STATUS_PROPAGATED
verified: true
```
The POLLING_AUTHORIZATIONS state on henry-api-staging has not advanced over multiple polls; effectively also stuck.
Public DNS sanity check (re-run after your previous message): all four hostnames return the Railway CNAME + 66.33.22.x from 1.1.1.1, 1.0.0.1, 8.8.8.8, 9.9.9.9, 9.9.9.10, 149.112.112.112, 149.112.112.10, and the OpenDNS pair. Authoritative answer from ns-{b,c,d}.galea.at shows only the CNAME (no stray A record). Resolution is converged across every vantage we can test.
The two production-environment hostnames henry-api.wwp-group.com, henry-mcp.wwp-group.com) on the same service are in equivalent stuck states.
So on Railway's own status fields: DNS propagated, ownership verified, no rate-limit indication, and the failure type your API surfaces is literally CERTIFICATE_ERROR_TYPE_INTERNAL with the message "Please retry or contact support". I've only clicked Try Again once on the api-staging hostname and have not touched anything since.
Could you advance / inspect the internal cert worker for these four domains from your side? I'd rather not keep clicking retry and risk LE rate-limit. Happy to hold.
Thanks!
11 days ago
Noticed incident 1QD5978Z had the same symptom — "Failed to issue TLS certificate" on custom domains, caused by an upstream LE issue, resolved by Railway-side retry. Could this be related, or is our project's pipeline stuck in a similar state that needs a manual nudge? CERTIFICATE_ERROR_TYPE_INTERNAL on all four hostnames after clean retries.
11 days ago
Noticed incident 1QD5978Z had the same symptom — "Failed to issue TLS certificate" on custom domains, caused by an upstream LE issue, resolved by Railway-side retry. Could this be related, or is our project's pipeline stuck in a similar state that needs a manual nudge? CERTIFICATE_ERROR_TYPE_INTERNAL on all four hostnames after clean retries.