6 days ago
I've read quite a few threads now on the repeated healthcheck failures.
It's quite inconsistent between deploys of minor commits to my staging branch.
Deploy log reads INFO: 100.64.0.2:38111 - "GET /healthz HTTP/1.1" 200 OK
but build log
====================
Starting Healthcheck
====================
Path: /healthz
Retry window: 2m0s
Attempt #1 failed with service unavailable. Continuing to retry for 1m49s
Attempt #1 failed with service unavailable. Continuing to retry for 1m49s
Attempt #2 failed with service unavailable. Continuing to retry for 1m38s
Attempt #3 failed with service unavailable. Continuing to retry for 1m26s
Attempt #4 failed with service unavailable. Continuing to retry for 1m12s
Attempt #5 failed with service unavailable. Continuing to retry for 54s
1/2 replicas never became healthy!
Healthcheck failed!
Grateful if you took a look
3 Replies
Status changed to Open Railway • 6 days ago
6 days ago
Seeing GET /healthz 200 OK in logs while Railway still fails the deployment usually means one replica becomes healthy briefly, but another replica never stabilizes.
The important clue is:
1/2 replicas never became healthy
This is commonly caused by:
- slow startup timing
- startup race conditions
- one replica crashing/restarting
/healthzdepending on DB/Redis/external services- app binding late or inconsistently
A few things worth checking:
- Ensure the app binds to:
0.0.0.0:$PORT-
Keep
/healthzextremely lightweight and independent of DB/external services. -
If using FastAPI/Gunicorn/Uvicorn, avoid heavy startup hooks or blocking initialization.
-
If migrations run during deploy/startup, replicas can interfere with each other intermittently.
-
Check whether one replica is restarting silently after initial success.
The confusing part is that Railway healthchecks may hit the endpoint successfully once (showing 200 in logs), while orchestration still marks the deployment unhealthy if the replica exits, becomes unreachable, or fails readiness timing afterward.
5 days ago
Not listening on the PORT variable or omitting it when using target ports can result in your health check returning a service unavailable error.
You can read more about it here: https://docs.railway.com/deployments/healthchecks#configure-the-healthcheck-port
5 days ago
The GET /healthz 200 OK log proves at least one replica became reachable, but 1/2 replicas never became healthy indicates the second replica is either crashing, hanging during startup, or failing readiness intermittently after initial success.
Most commonly this is caused by startup races, blocking initialization (DB/migrations/Redis), or the app not consistently binding to 0.0.0.0:$PORT, so keep /healthz dependency-free and inspect replica-specific logs for silent restarts.