Custom domain attachment not reaching edge proxy layer
glasgowlad
PROOP

a month ago

Hi team —

We're attaching custom domains to a service via the API:

  1. updateServiceTool adds the custom domain (e.g. app.my-cc.io)

    to the service's serviceDomains with port 3000.

  2. getServiceConfigTool confirms the domain persists in the service

    config immediately after the update.

  3. deployServiceTool completes with status SUCCESS.

But HTTP requests to the custom domain still return

x-railway-fallback: true with HTTP 502 ("Application failed to

respond"), indicating the edge proxy has no routing rule for that

hostname.

Meanwhile, the TLS certificate for the custom domain IS issued and

valid (CN=app.my-cc.io, valid May 18 – Aug 16 2026), proving the

domain registry recognizes the hostname. The bug is isolated to the

inbound HTTP routing layer.

Reproduced consistently across 5+ deployment cycles and 3+

re-attachment attempts today on:

  • service 986d281c-5fc7-45a7-9ee8-cbff20ab5d34 (dashboard) →

    app.my-cc.io

  • service 2caa3252-4618-4f00-af4b-b74a7e9d9957 (gov-server) →

    api.my-cc.io (note: api.my-cc.io eventually self-healed, possibly

    due to different deploy timing — only app.my-cc.io is currently

    broken)

Direct service URL dashboard-production-b9d8.up.railway.app

returns HTTP 200 cleanly, so the container is fine.

Please push the staged custom-domain attachment to your edge proxy,

or tell me the correct API verb to do it myself.

Project ID: d248b5d7-865b-4cc8-be42-d7023292be53

Environment: production (10df2f22-6bb9-4add-a9ae-4b2e8b8c7584)

Recent stuck deploy: 2084ce12-5bcf-429a-912b-193f83de0418

Thanks.

$20 Bounty

7 Replies

Railway
BOT

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway 28 days ago


gyanavkhandelwal6396-cmyk
FREE

a month ago

The core issue here is that Railway's edge proxy routing table isn't being updated when you attach the custom domain via API — the TLS cert being issued proves the domain is registered at the certificate layer, but the HTTP routing rule (which maps the hostname to your upstream service) is a separate propagation step that's clearly not completing. This is a known gap in Railway's API flow: updateServiceTool + deployServiceTool updates the service config and triggers a container redeploy, but it doesn't always force the edge proxy to reload its routing rules for newly attached custom domains, especially when the domain attachment and deploy happen in close succession. The fix Railway should apply on their end is flushing/reloading the edge routing table for your project (ID: d248b5d7-865b-4cc8-be42-d7023292be53), but what you can try yourself right now is: detach the custom domain app.my-cc.io from service 986d281c entirely via the API, wait ~60 seconds, re-attach it, and then trigger a fresh deploy — don't attach and deploy in the same call or back-to-back without that pause, since api.my-cc.io self-healing on different deploy timing strongly suggests a race condition between domain registration and proxy rule propagation. If that doesn't resolve it, ask Railway support specifically to "force edge routing table sync for custom domain app.my-cc.io on environment 10df2f22" — that's the exact internal operation needed, and citing the stuck deploy hash 2084ce12 gives them the anchor to find it in their infra logs.

cheers


glasgowlad
PROOP

a month ago

Thanks for the detailed diagnosis — that race-condition theory matched what we were seeing exactly.

We ran the exact sequence you described:

  1. Detached app.my-cc.io from service 986d281c entirely (confirmed via getServiceConfigTool — gone from serviceDomains)
  2. Paused ~75 seconds
  3. Re-attached fresh on port 3000 (confirmed back in serviceDomains with {"app.my-cc.io":{"port":3000}})
  4. Paused another ~3 minutes — deliberately NOT back-to-back with the attach
  5. Triggered a fresh deploy: d7e16afb-4d5d-4f5c-bd9b-9a4ec8ed3da2 — completed SUCCESS at 16:29 UTC

Result: cert is still valid (CN=app.my-cc.io, May 18 – Aug 16), direct URL dashboard-production-b9d8.up.railway.app returns HTTP 200 cleanly, but https://app.my-cc.io/ STILL returns HTTP 502 with x-railway-fallback: true and "Application failed to respond".

So the propagation gap isn't a simple back-to-back race — even with a deliberate ~4-minute total gap between attach and deploy, the edge routing rule never lands. This appears to be deeper than a timing issue.

I've opened a Railway support ticket asking specifically for "force edge routing table sync for custom domain app.my-cc.io on environment 10df2f22, stuck deploy hash 2084ce12" per your suggestion. Will report back when they reply.

If you have any other workarounds we haven't tried — e.g., creating an entirely new service and migrating the domain there, or rotating the project's edge proxy assignment via the dashboard — I'd love to hear them. Otherwise we're stuck waiting on internal Railway action.

Cheers, and thanks again for the thoughtful response.


gyanavkhandelwal6396-cmyk
FREE

a month ago

The cleanest path forward is to create a brand-new Railway service in the same project and environment, deploy your exact same container/config to it, and attach app.my-cc.io to that fresh service — because the corruption is almost certainly a stale routing record tied specifically to service 986d281c in Railway's edge proxy DB, and no amount of detach/re-attach cycles will fix a poisoned row on the same service ID. Before attaching the real domain, first attach a throwaway subdomain like test.my-cc.io, do a deploy, confirm it returns HTTP 200 through the edge (not just the direct .up.railway.app URL), then detach it and immediately attach app.my-cc.io on port 3000 and deploy once more — this two-step proves the proxy registration pipeline is clean on the new service before you commit the production domain to it. In parallel, update your Railway support ticket to explicitly say: "Please delete the edge routing table entry for app.my-cc.io mapped to service 986d281c in environment 10df2f22 — not just re-sync it, but hard-delete and let it recreate — because the record appears corrupt and survives detach/re-attach cycles on our end" — that framing forces them past the standard "redeploy and wait" response and gets an infra engineer to touch the routing DB directly, which is the only thing that will permanently fix the old service ID if you ever need to reuse it.

cheers..


Your web app is a Next.js project. Next.js by default listens on the PORT variable if present; which Railway injects into your container automatically (usually set to 8080). Your issue is most likely related to a misconfigured port. You can verify the port your app is listening on by looking at the deploy logs, and then map that port to your custom domain URL.

Additionally, if you want your app to listen on port 3000, and you have already mapped your custom domain URL to port 3000, just set PORT=3000 in your service variables, and Next.js will pick it up automatically.

"Application failed to respond" error, is almost always caused by mapping your URL to a port your app isn't listening on.


glasgowlad
PROOP

a month ago

Still have the same issue even after assigning app.my-cc.io. Works fine with test.my-cc.io just not app.my-cc.io


glasgowlad
PROOP

a month ago

Quick update with confirming data on the hostname-scoped-corruption

theory:

Per the suggested workaround, I spun up a fresh Railway service in

the same project / environment, deployed identical config, and used

test.my-cc.io as a smoke check before binding the real hostname.

Result on that single new service:

Both hostnames attached on port 3000 (verified via

getServiceConfigTool), both TLS certs valid, container returns

HTTP 200 on its dashboard-production-*.up.railway.app URL. The only

variable that produces a 502 is the hostname string itself.

That rules out:

  • Container / PORT misconfiguration (test.my-cc.io proves the

    listener is healthy on port 3000)

  • Service-ID-scoped corruption on the original

    986d281c-5fc7-45a7-9ee8-cbff20ab5d34 (the brand-new service

    repros app.my-cc.io 502 identically)

  • Attach-deploy timing races (multiple cycles, including 3+ minute

    gaps and a full service migration, have not converged)

Remaining hypothesis: the edge proxy routing entry keyed on the

hostname app.my-cc.io is poisoned at the routing-table layer,

independent of service ID. Detach + re-attach binds the new service

into the user-facing service config, but the stuck routing rule for

that exact hostname survives the rebind.

What I need from Railway infra:

Hard-delete (NOT re-sync) the edge-proxy routing entry for

hostname app.my-cc.io in project

d248b5d7-865b-4cc8-be42-d7023292be53, environment

10df2f22-6bb9-4add-a9ae-4b2e8b8c7584. Let it recreate cleanly

from the current service binding on the next deploy.

Reference IDs:

  • Old (broken) service: 986d281c-5fc7-45a7-9ee8-cbff20ab5d34
  • New service (same symptom): [insert new service ID]
  • Last stuck deploy (old svc): 2084ce12-5bcf-429a-912b-193f83de0418
  • Latest test deploy (new svc):[insert latest deploy ID]

This is blocking the production customer signup flow —

app.my-cc.io is our customer dashboard hostname, and the direct

.up.railway.app URL is not the published address customers reach.

Happy to share any further repro data the infra team needs: curl

headers, deploy logs, container-side request logs.


glasgowlad
PROOP

a month ago

How do I get a message to the Railway tech team to fix this issue?


Welcome!

Sign in to your Railway account to join the conversation.

Loading...