a month ago
Hi team —
We're attaching custom domains to a service via the API:
-
updateServiceTooladds the custom domain (e.g.app.my-cc.io)to the service's
serviceDomainswith port 3000. -
getServiceConfigToolconfirms the domain persists in the serviceconfig immediately after the update.
-
deployServiceToolcompletes with status SUCCESS.
But HTTP requests to the custom domain still return
x-railway-fallback: true with HTTP 502 ("Application failed to
respond"), indicating the edge proxy has no routing rule for that
hostname.
Meanwhile, the TLS certificate for the custom domain IS issued and
valid (CN=app.my-cc.io, valid May 18 – Aug 16 2026), proving the
domain registry recognizes the hostname. The bug is isolated to the
inbound HTTP routing layer.
Reproduced consistently across 5+ deployment cycles and 3+
re-attachment attempts today on:
-
service
986d281c-5fc7-45a7-9ee8-cbff20ab5d34(dashboard) →app.my-cc.io
-
service
2caa3252-4618-4f00-af4b-b74a7e9d9957(gov-server) →api.my-cc.io (note: api.my-cc.io eventually self-healed, possibly
due to different deploy timing — only app.my-cc.io is currently
broken)
Direct service URL dashboard-production-b9d8.up.railway.app
returns HTTP 200 cleanly, so the container is fine.
Please push the staged custom-domain attachment to your edge proxy,
or tell me the correct API verb to do it myself.
Project ID: d248b5d7-865b-4cc8-be42-d7023292be53
Environment: production (10df2f22-6bb9-4add-a9ae-4b2e8b8c7584)
Recent stuck deploy: 2084ce12-5bcf-429a-912b-193f83de0418
Thanks.
7 Replies
a month ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • about 1 month ago
a month ago
The core issue here is that Railway's edge proxy routing table isn't being updated when you attach the custom domain via API — the TLS cert being issued proves the domain is registered at the certificate layer, but the HTTP routing rule (which maps the hostname to your upstream service) is a separate propagation step that's clearly not completing. This is a known gap in Railway's API flow: updateServiceTool + deployServiceTool updates the service config and triggers a container redeploy, but it doesn't always force the edge proxy to reload its routing rules for newly attached custom domains, especially when the domain attachment and deploy happen in close succession. The fix Railway should apply on their end is flushing/reloading the edge routing table for your project (ID: d248b5d7-865b-4cc8-be42-d7023292be53), but what you can try yourself right now is: detach the custom domain app.my-cc.io from service 986d281c entirely via the API, wait ~60 seconds, re-attach it, and then trigger a fresh deploy — don't attach and deploy in the same call or back-to-back without that pause, since api.my-cc.io self-healing on different deploy timing strongly suggests a race condition between domain registration and proxy rule propagation. If that doesn't resolve it, ask Railway support specifically to "force edge routing table sync for custom domain app.my-cc.io on environment 10df2f22" — that's the exact internal operation needed, and citing the stuck deploy hash 2084ce12 gives them the anchor to find it in their infra logs.
cheers
a month ago
Thanks for the detailed diagnosis — that race-condition theory matched what we were seeing exactly.
We ran the exact sequence you described:
- Detached app.my-cc.io from service 986d281c entirely (confirmed via getServiceConfigTool — gone from serviceDomains)
- Paused ~75 seconds
- Re-attached fresh on port 3000 (confirmed back in serviceDomains with {"app.my-cc.io":{"port":3000}})
- Paused another ~3 minutes — deliberately NOT back-to-back with the attach
- Triggered a fresh deploy: d7e16afb-4d5d-4f5c-bd9b-9a4ec8ed3da2 — completed SUCCESS at 16:29 UTC
Result: cert is still valid (CN=app.my-cc.io, May 18 – Aug 16), direct URL dashboard-production-b9d8.up.railway.app returns HTTP 200 cleanly, but https://app.my-cc.io/ STILL returns HTTP 502 with x-railway-fallback: true and "Application failed to respond".
So the propagation gap isn't a simple back-to-back race — even with a deliberate ~4-minute total gap between attach and deploy, the edge routing rule never lands. This appears to be deeper than a timing issue.
I've opened a Railway support ticket asking specifically for "force edge routing table sync for custom domain app.my-cc.io on environment 10df2f22, stuck deploy hash 2084ce12" per your suggestion. Will report back when they reply.
If you have any other workarounds we haven't tried — e.g., creating an entirely new service and migrating the domain there, or rotating the project's edge proxy assignment via the dashboard — I'd love to hear them. Otherwise we're stuck waiting on internal Railway action.
Cheers, and thanks again for the thoughtful response.
a month ago
The cleanest path forward is to create a brand-new Railway service in the same project and environment, deploy your exact same container/config to it, and attach app.my-cc.io to that fresh service — because the corruption is almost certainly a stale routing record tied specifically to service 986d281c in Railway's edge proxy DB, and no amount of detach/re-attach cycles will fix a poisoned row on the same service ID. Before attaching the real domain, first attach a throwaway subdomain like test.my-cc.io, do a deploy, confirm it returns HTTP 200 through the edge (not just the direct .up.railway.app URL), then detach it and immediately attach app.my-cc.io on port 3000 and deploy once more — this two-step proves the proxy registration pipeline is clean on the new service before you commit the production domain to it. In parallel, update your Railway support ticket to explicitly say: "Please delete the edge routing table entry for app.my-cc.io mapped to service 986d281c in environment 10df2f22 — not just re-sync it, but hard-delete and let it recreate — because the record appears corrupt and survives detach/re-attach cycles on our end" — that framing forces them past the standard "redeploy and wait" response and gets an infra engineer to touch the routing DB directly, which is the only thing that will permanently fix the old service ID if you ever need to reuse it.
cheers..
a month ago
Your web app is a Next.js project. Next.js by default listens on the PORT variable if present; which Railway injects into your container automatically (usually set to 8080). Your issue is most likely related to a misconfigured port. You can verify the port your app is listening on by looking at the deploy logs, and then map that port to your custom domain URL.
Additionally, if you want your app to listen on port 3000, and you have already mapped your custom domain URL to port 3000, just set PORT=3000 in your service variables, and Next.js will pick it up automatically.
"Application failed to respond" error, is almost always caused by mapping your URL to a port your app isn't listening on.
a month ago
Still have the same issue even after assigning app.my-cc.io. Works fine with test.my-cc.io just not app.my-cc.io
a month ago
Quick update with confirming data on the hostname-scoped-corruption
theory:
Per the suggested workaround, I spun up a fresh Railway service in
the same project / environment, deployed identical config, and used
test.my-cc.io as a smoke check before binding the real hostname.
Result on that single new service:
-
https://test.my-cc.io → HTTP 200 through the edge
(no x-railway-fallback header) -
https://app.my-cc.io → still HTTP 502 with
x-railway-fallback: true
Both hostnames attached on port 3000 (verified via
getServiceConfigTool), both TLS certs valid, container returns
HTTP 200 on its dashboard-production-*.up.railway.app URL. The only
variable that produces a 502 is the hostname string itself.
That rules out:
-
Container / PORT misconfiguration (test.my-cc.io proves the
listener is healthy on port 3000)
-
Service-ID-scoped corruption on the original
986d281c-5fc7-45a7-9ee8-cbff20ab5d34 (the brand-new service
repros app.my-cc.io 502 identically)
-
Attach-deploy timing races (multiple cycles, including 3+ minute
gaps and a full service migration, have not converged)
Remaining hypothesis: the edge proxy routing entry keyed on the
hostname app.my-cc.io is poisoned at the routing-table layer,
independent of service ID. Detach + re-attach binds the new service
into the user-facing service config, but the stuck routing rule for
that exact hostname survives the rebind.
What I need from Railway infra:
Hard-delete (NOT re-sync) the edge-proxy routing entry for
hostname app.my-cc.io in project
d248b5d7-865b-4cc8-be42-d7023292be53, environment
10df2f22-6bb9-4add-a9ae-4b2e8b8c7584. Let it recreate cleanly
from the current service binding on the next deploy.Reference IDs:
- Old (broken) service: 986d281c-5fc7-45a7-9ee8-cbff20ab5d34
- New service (same symptom): [insert new service ID]
- Last stuck deploy (old svc): 2084ce12-5bcf-429a-912b-193f83de0418
- Latest test deploy (new svc):[insert latest deploy ID]
This is blocking the production customer signup flow —
app.my-cc.io is our customer dashboard hostname, and the direct
.up.railway.app URL is not the published address customers reach.
Happy to share any further repro data the infra team needs: curl
headers, deploy logs, container-side request logs.
a month ago
How do I get a message to the Railway tech team to fix this issue?