Healthcheck fails
physiotutors
PROOP

6 days ago

I've read quite a few threads now on the repeated healthcheck failures.

It's quite inconsistent between deploys of minor commits to my staging branch.

Deploy log reads INFO: 100.64.0.2:38111 - "GET /healthz HTTP/1.1" 200 OK

but build log

====================

Starting Healthcheck

====================

Path: /healthz

Retry window: 2m0s

Attempt #1 failed with service unavailable. Continuing to retry for 1m49s

Attempt #1 failed with service unavailable. Continuing to retry for 1m49s

Attempt #2 failed with service unavailable. Continuing to retry for 1m38s

Attempt #3 failed with service unavailable. Continuing to retry for 1m26s

Attempt #4 failed with service unavailable. Continuing to retry for 1m12s

Attempt #5 failed with service unavailable. Continuing to retry for 54s

1/2 replicas never became healthy!

Healthcheck failed!

Grateful if you took a look

$20 Bounty

3 Replies

Status changed to Open Railway 6 days ago


balkar1998
FREE

6 days ago

Seeing GET /healthz 200 OK in logs while Railway still fails the deployment usually means one replica becomes healthy briefly, but another replica never stabilizes.

The important clue is:

1/2 replicas never became healthy

This is commonly caused by:

  • slow startup timing
  • startup race conditions
  • one replica crashing/restarting
  • /healthz depending on DB/Redis/external services
  • app binding late or inconsistently

A few things worth checking:

  1. Ensure the app binds to:
0.0.0.0:$PORT
  1. Keep /healthz extremely lightweight and independent of DB/external services.

  2. If using FastAPI/Gunicorn/Uvicorn, avoid heavy startup hooks or blocking initialization.

  3. If migrations run during deploy/startup, replicas can interfere with each other intermittently.

  4. Check whether one replica is restarting silently after initial success.

The confusing part is that Railway healthchecks may hit the endpoint successfully once (showing 200 in logs), while orchestration still marks the deployment unhealthy if the replica exits, becomes unreachable, or fails readiness timing afterward.


Not listening on the PORT variable or omitting it when using target ports can result in your health check returning a service unavailable error.

You can read more about it here: https://docs.railway.com/deployments/healthchecks#configure-the-healthcheck-port


gyanavkhandelwal6396-cmyk
FREE

5 days ago

The GET /healthz 200 OK log proves at least one replica became reachable, but 1/2 replicas never became healthy indicates the second replica is either crashing, hanging during startup, or failing readiness intermittently after initial success.

Most commonly this is caused by startup races, blocking initialization (DB/migrations/Redis), or the app not consistently binding to 0.0.0.0:$PORT, so keep /healthz dependency-free and inspect replica-specific logs for silent restarts.


Welcome!

Sign in to your Railway account to join the conversation.

Loading...