Healthcheck fails - Railway Central Station

Healthcheck fails

physiotutors

PROOP

2 months ago

I've read quite a few threads now on the repeated healthcheck failures.

It's quite inconsistent between deploys of minor commits to my staging branch.

Deploy log reads INFO: 100.64.0.2:38111 - "GET /healthz HTTP/1.1" 200 OK

but build log

====================

Starting Healthcheck

====================

Path: /healthz

Retry window: 2m0s

Attempt #1 failed with service unavailable. Continuing to retry for 1m49s

Attempt #2 failed with service unavailable. Continuing to retry for 1m38s

Attempt #3 failed with service unavailable. Continuing to retry for 1m26s

Attempt #4 failed with service unavailable. Continuing to retry for 1m12s

Attempt #5 failed with service unavailable. Continuing to retry for 54s

1/2 replicas never became healthy!

Healthcheck failed!

Grateful if you took a look

$20 Bounty

3 Replies

Status changed to Open Railway • about 2 months ago

balkar1998

FREE

2 months ago

Seeing GET /healthz 200 OK in logs while Railway still fails the deployment usually means one replica becomes healthy briefly, but another replica never stabilizes.

The important clue is:

1/2 replicas never became healthy

This is commonly caused by:

slow startup timing
startup race conditions
one replica crashing/restarting
/healthz depending on DB/Redis/external services
app binding late or inconsistently

A few things worth checking:

Ensure the app binds to:

0.0.0.0:$PORT

Keep /healthz extremely lightweight and independent of DB/external services.
If using FastAPI/Gunicorn/Uvicorn, avoid heavy startup hooks or blocking initialization.
If migrations run during deploy/startup, replicas can interfere with each other intermittently.
Check whether one replica is restarting silently after initial success.

The confusing part is that Railway healthchecks may hit the endpoint successfully once (showing 200 in logs), while orchestration still marks the deployment unhealthy if the replica exits, becomes unreachable, or fails readiness timing afterward.

darseen

HOBBYTop 1% Contributor

2 months ago

Not listening on the PORT variable or omitting it when using target ports can result in your health check returning a service unavailable error.

gyanavkhandelwal6396-cmyk

FREE

2 months ago

The GET /healthz 200 OK log proves at least one replica became reachable, but 1/2 replicas never became healthy indicates the second replica is either crashing, hanging during startup, or failing readiness intermittently after initial success.

Most commonly this is caused by startup races, blocking initialization (DB/migrations/Redis), or the app not consistently binding to 0.0.0.0:$PORT, so keep /healthz dependency-free and inspect replica-specific logs for silent restarts.

Welcome!