Intermittent deployment health check failures and 502s despite healthy server
skorisepati1
PROOP

16 days ago

We're experiencing intermittent deployment failures (~50% of deploys) on our Node.js Express service. The behavior is inconsistent across deploys of the same code:
Scenario 1 — Silent startup failure:

  • Build succeeds, container starts, node dist/index.js runs

  • Zero output — no logs, no errors, no crash trace

  • Health check gets "service unavailable" (connection refused) for the entire retry window

  • Deploy marked as failed

Scenario 2 — Successful start then immediate shutdown:

  • Build succeeds, container starts, server starts in ~2s

  • "Server started" log appears

  • Health check passes, deploy shows as succeeded

  • Shortly after, "Stopping Container" appears with no preceding error/SIGTERM/crash

  • Health endpoint starts returning 502 (Railway proxy has no backend)

  • Service is down until manual redeploy

Scenario 3 — Normal (works fine):

  • Same code, same config — deploys successfully, stays running indefinitely

$10 Bounty

3 Replies

skorisepati1
PROOP

16 days ago

In fact, most recently, the service started and was up and running in time as well and we are still getting service unavailable issues.

Attempt #7 failed with service unavailable. Continuing to retry for 25s

Sometimes, even taking the entire service down and then bringing it back up with the new deployment doesn't work as well. Same issues.


darseen
HOBBYTop 5% Contributor

16 days ago

Railway docs mentions "If your application does not permit requests from that hostname, you may encounter errors during the healthcheck process, such as "failed with service unavailable" or "failed with status 400"." In this case you would need to add healthcheck.railway.app to your list of allowed hosts.

I'd also suggest you check if your are using the PORT env variable provided by Railway in each of your services that are failing the healthcheck, as it's mentioned in the docs that if your application doesn't listen on the PORT variable, possibly due to using target ports, you can manually set a PORTvariable to inform Railway of the port to use for health checks.

You can read more about it here.


15 days ago

The PORT environment variable is not the issue here. Even we have already configured the port in the networking settings, yet the health check continues to fail.

Attachments


Loading...