2 hours ago
Both my services on the same project have been failing every healthcheck since the Railpack migration (~May 21, 2026). Live prod has stayed up via Railway's safe-rollback (last passing deploys keep serving), but no new code can ship and traffic now returns 502 from outside despite the dashboard showing services as Active.
Project: perfect-analysis | Environment: production | Region: US West
Services: nibbleWeb (trynibble.app) and api (api.trynibble.app)
Builder: Railpack | Node: 22.x
Symptoms
API service – Railway's own auto-diagnosis:
"Contact Railway support, as the healthcheck proxy cannot reach the containers despite the app starting correctly on both replicas and listening on port 8080. Zero HTTP requests reached the app during the full 5-minute healthcheck window across 14 attempts. This same failure has repeated across 7 or more consecutive deployments, pointing to a persistent container networking issue."
Both services – every external probe returns:
HTTP 502, X-Railway-Fallback: true, body {"status":"error","code":502,"message":"Application failed to respond"}
…even when the dashboard shows the deployment as Active / Online and the container has started successfully.
What I've already verified / ruled out
- Both apps listen on the correct PORT (process.env.PORT, with explicit Number() coercion, fallback 3000)
- Both apps now bind explicitly to '0.0.0.0' (IPv4) — ruling out the Node 18+ '::' dual-stack theory that several similar threads suggested
- Healthcheck path is /health on both, and /health is a simple sync 200 for web; api /health does a single Supabase ping that returns in ~150ms locally
- I've tried Path A (Networking Target Port set to dynamic / no explicit PORT env var) and Path B (Networking Target Port = 3000 with explicit PORT=3000 env var) — same failure mode either way
- Railpack startCommand pinned in railpack.json for both services (npm start)
- Build phase succeeds in ~30s; Deploy phase succeeds in ~5s; Healthcheck consistently fails with "service unavailable" on every single probe attempt
- App startup logs confirm successful bind: "Nibble API server running on 0.0.0.0:8080"
What this looks like from the outside
It looks like Railway's edge proxy + healthcheck proxy lost their container mapping after the Railpack migration and never recovered. The containers exist, start cleanly, log requests internally during smoke testing — but no external traffic ever reaches them. The 502 + X-Railway-Fallback:true means the edge layer is failing back before even attempting to forward, suggesting it has no route entry for the running container IPs.
Related public threads with the same symptom
- https://station.railway.com/questions/intermittent-deployment-health-check-fai-02457844
- https://station.railway.com/questions/persistent-health-check-routing-failures-23c41579
- https://station.railway.com/questions/app-deploys-successfully-but-returns-502-1cedac14
- https://station.railway.com/questions/healthcheck-fails-despite-the-server-bei-102b3a63
What I'd appreciate help with
- Can someone on the Railway team manually re-register the edge → container mapping for my services? Per the diagnosis on api, this has the signature of a persistent infrastructure-side issue.
- Is there any client-side diagnostic I can run inside the container to confirm the healthcheck probe is or isn't reaching me? E.g., can I tcpdump or watch /var/log for inbound connection attempts during the healthcheck window?
- If this is the same root cause as the four related threads, would the team consider a postmortem note pinned to the docs — the symptom (healthcheck never reaches a healthy container) is now common enough that the auto-diagnosis explicitly says "contact support" rather than offering a code fix.
Thank you to anyone reading this.
1 Replies
Status changed to Open Railway • about 2 hours ago
an hour ago
the 502 means railway's edge is routing requests to a port your container isn't listening on. couple of usual causes:
- app bound to
127.0.0.1orlocalhostinstead of0.0.0.0— change it to0.0.0.0so the container's external interface accepts connections - not reading
PORTfrom env — railway picks a port and injects it as$PORT; if your code uses a hardcoded value that doesn't match, edge can't route - if you're already on
0.0.0.0:$PORTand still 502: check Service Settings → Networking → Target Port. it has to match the port your app actually opens.
railway shell + netstat -tlnp shows what's actually bound from inside the container.