23 days ago
Summary
All three custom domains on a newly created staging environment are stuck atCERTIFICATE_STATUS_TYPE_VALIDATING_OWNERSHIPand return HTTP 404 "Application not found" from the Railway edge, despite correct service-side configuration. The issue appears to be an edge ingress config sync failure for this specific environment.
- Affected Services: api, worker, web (plus guardrail-proxy, rspamd, Postgres-RTuS, Redis-OAIR)
Affected Custom Domains
api-staging.replylayer.ai→ serviceapi(ID: 4b1e2ea0-b861-4f2e-9c05-1b40fd092886)- Custom domain ID: 5d1d2be1-349e-4277-9db0-e30db1a3aea5
- Proxy target: 2t5mwcot.up.railway.app
hooks-staging.replylayer.ai→ serviceworker(ID: 987a0b5d-0b1c-4bf9-af76-f59f3850fdfb)- Custom domain ID: 5583cfa5-d631-49cd-befe-8805fbdcb92c
- Proxy target: e30wajt1.up.railway.app
app-staging.replylayer.ai→ serviceweb(ID: 39348200-0541-4adc-b0c8-1e1b2f2f6584)- Custom domain ID: 8351cdc6-919b-40b9-a510-3a997b846d89
- Proxy target: hpmg6vkv.up.railway.app
What Works (Service-Side State is Correct)
- All three services show
status=SUCCESSwith active deployments (timestamps 2026-04-23T14:34:06Z) - Native Railway domains respond with HTTP 200 and healthy payloads
- GraphQL
customDomainquery confirms each domain is correctly bound:serviceId,environmentId, andtargetPortall populateddeletedAt: null(not soft-deleted)
- Service networking config includes custom domains in
serviceDomains:- api:
{"api-staging-f5fa.up.railway.app": {}, "api-staging.replylayer.ai": {}} - worker:
{"worker-staging-54e6.up.railway.app": {}, "hooks-staging.replylayer.ai": {}} - web:
{"web-staging-3e6a.up.railway.app": {}, "app-staging.replylayer.ai": {}}
- api:
- Cloudflare CNAMEs are in place, unproxied (orange-cloud off), and fully propagated
- DNS status:
DNS_RECORD_STATUS_PROPAGATEDper Railway API
- DNS status:
What Doesn't Work (Edge-Side Failure)
- All three custom domains return HTTP 404 from
server: railway-edge:
{"status":"error","code":404,"message":"Application not found","request_id":"..."} - The proxy targets also return 404:
curl -sS https://2t5mwcot.up.railway.app/v1/health
# → HTTP 404 Application not found - Certificate validation is stuck at
CERTIFICATE_STATUS_TYPE_VALIDATING_OWNERSHIP(45+ minutes, no progress)certificateErrorMessage: null(no error signal from API)certificateRetryable: null- This is expected: Let's Encrypt's HTTP-01 challenge at
/.well-known/acme-challenge/*cannot complete because the edge returns 404 for the domain
Request IDs for Edge Tracing
Latest attempts (after service restarts):
api-staging.replylayer.ai:NHjuGC42RyGuzK1Ic9o55Qhooks-staging.replylayer.ai:zAu4bF7CRpq1TYBlBhdwDg
Earlier attempts (before restarts):
api-staging.replylayer.ai:07No5ulSQsiPLyF7woOzXwhooks-staging.replylayer.ai:Pq1X8piWTISWhDgLaP71AA
Troubleshooting Steps Already Taken
- Waited for DNS propagation: 30+ minutes after CNAME publish; status unchanged
- Verified Cloudflare config: CNAMEs are unproxied (orange-cloud off), so Cloudflare is not in the TLS path
- Tested HTTP-01 challenge path:
curl http://api-staging.replylayer.ai/.well-known/acme-challenge/testreturns 404 from railway-edge (expected, since the domain route doesn't exist on the edge) - Recreated custom domain: Deleted and recreated
api-staging.replylayer.ai, got a new proxy target (2t5mwcot.up.railway.app), updated CF CNAME. Same 404 and VALIDATING_OWNERSHIP stall. - Set explicit targetPort: Updated
api-stagingcustom domain totargetPort=3000(service listens on 3000 viaAPI_PORTenv var). No change. - Tested with Host header: Sent request with matching Host header against proxy target; still 404.
- Updated service networking config: Added custom domains to
serviceDomainsvia Railway API. Deployed services. - Restarted services: Triggered container restarts to force edge re-sync. Services came back up cleanly with fresh deployments. Edge still returns 404.
Root Cause Analysis
The service-side state is correct and complete. The edge is returning 404, which means:
- The ingress config for this environment is either stale or missing the custom domain routes
- The config propagation path from service
serviceDomains→ edge ingress is not syncing for this environment - This is not a DNS issue, TLS issue, or service availability issue
What We Need
A manual investigation and/or resync of the edge ingress config for environment35ee243a-1691-4eee-b73f-178f3ba75d15. The request IDs above should pinpoint the exact edge nodes serving the 404s and help identify whether the config is stale or missing.
Additional Context
- This is a fresh environment created today; no prior custom domain history
- All other services in the environment (databases, proxies, etc.) are functioning normally
- The issue is isolated to custom domain routing; native Railway domains work perfectly
- No errors or warnings in service logs or deployment output
1 Replies
23 days ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 23 days ago
23 days ago
I could not find a CNAME or a TXT record attached to any of the 3 domains you have listed above.
Make sure you added the CNAME and TXT records as instructed when adding a custom domain to your service(s).