Container starts but produces no runtime output, /health unreachable
stan-fury
HOBBYOP

7 days ago

Project: avengers initiative (09cd849a-c0b8-4fac-803e-49a42c06220f)

Environment: production

Service: avengers (95b8e389-ea41-4e52-9e67-90b0855fb681)

Latest deploy: 16a93ad2, status Completed

Public domain: avengers-production-b54b.up.railway.app

Symptom:

  • /health returns 502/000 to clients
  • - HTTP Logs: all requests return 499 with ~11s duration (edge holds connection, client times out)
  • - Deploy Logs: only "Starting Container", no further output
  • - Network Flow Logs: empty (zero outbound connections from the container)
  • - Deploy marked "Completed" but app does not serve traffic
  • - Persists across multiple redeploys today

What I verified is fine:

  • Code: runs perfectly via "railway run" locally with production env (full boot, all services connect, ~112 MB RSS)
  • - Env vars: all set correctly (verified by railway run)
  • - Build: SUCCESS, image 246.5 MB
  • - Target port: explicitly set to 8080 in Networking (was None before)
  • - PORT env var: set to 8080
  • - Builder: Dockerfile (/Dockerfile, ENTRYPOINT /usr/bin/tini --, CMD python main.py)
  • - Start command from railway.toml: python -u scripts/apply_migrations.py && python -u main.py
  • - Restart policy: ON_FAILURE, max retries 10
  • - Same commit (22b77f89) ran successfully for 3 days (deploy bc56e94c, May 11 19:24 to May 14 14:55 EEST)
  • - Locally reproducing the exact start command with production env + working DB: app boots fully, listens on 8080

Timeline of break:

  • 14:55 EEST today: /health returned 200 on deploy bc56e94c
  • - 15:06 EEST: new deploy 5eed5da3 happened (same commit, trigger unknown to me)
  • - 15:34 EEST onward: multiple redeploys, all show the same symptom

What is unusual:

  • Completed status with no app activity. With ON_FAILURE restart policy, a crashing process would restart up to 10x. We see only one Starting Container, so the process appears to exit 0 cleanly with zero output, or never executes the start command.
  • - Earlier deploys today (89910def, fc0e0ce7, 282f0240) showed [migrate] all up to date line. The latest deploy 16a93ad2 does not even show that, just Starting Container.

Please check internal logs for deploy 16a93ad2:

  1. Container process exit status
  2. 2. Exact command tini executed
  3. 3. Any cgroup OOM kills, resource throttling, or platform-side terminations not in user logs
  4. 4. Why the migrate log line appears for some recent deploys but not the latest

App is functional (proven via railway run). The container deployment is failing silently and user-visible observability shows nothing actionable.

Thank you.

$10 Bounty

2 Replies

Status changed to Open Railway 7 days ago


I'd try removing ENTRYPOINT in your Dockerfile. Also, I'd recommend choosing either using CMD or specifying the start command in your CaC file. Not 100% sure but it may be fighting each other.

Another note, if your service is truly online, it should say "Online" instead of "Completed."


dantor22
PROTop 10% Contributor

7 days ago

▎ The clue that jumps out most from your post is earlier deploys today showed migration logs, the latest one doesnt — same

▎ commit, same Dockerfile, no code change between em. That points at Railway-side config drift (start command / env vars)

▎ more than the image itself.

▎ Exit 0 + "Completed" status + no log output + no outbound traffic is the signature of a start command that silently no-ops,

▎ which can happen a handful of ways. worth checking in this order:

▎ 1. Custom Start Command in the dashboard. Settings → Deploy → Custom Start Command. Open an older successful deploy from

▎ the history and compare its config to the current one. silent-fail patterns:

▎ - Field got cleared → tini -- runs with nothing → exits 0 immediately

▎ - Backgrounded process (python main.py &) → shell returns 0, container ends

▎ - Typo'd module / path → Python can exit before stdout is flushed

▎ 2. PYTHONUNBUFFERED=1 env var. If its not set Python buffers stdout, so a crash + exit before flush gives you exactly what

▎ your seeing: "Starting Container" then nothing. Add it regardless — even if its not the root cause you stop flying blind on

▎ the next deplyo.

▎ 3. Variables history. Same dashboard, Variables tab — diff env vars on the last good deploy vs current. Common pattern: a

▎ config validator does sys.exit(0) when an expected env var is missing and the app dies cleanly with no error.

▎ 4. Healthcheck path. 502/000 on /health is a consequence of nothing listening, not the cause. Worth a quick glance to

▎ confirm /health is what the app actually exposes, but its not where the bug lives.

▎ On the suggestion from @0x5b62656e5d about removing the tini ENTRYPOINT — fair instinct and worth keeping in the back

▎ pocket. Railway's Custom Start Command overrides Docker's CMD but ENTRYPOINT ["/tini", "--"] still wraps whatever runs, so

▎ tini should be exec-ing your start command transparently. Id rule out the four items above first since dropping tini costs

▎ you signal handling + zombie reaping — but if none of those land then yeah, trying without the ENTRYPOINT is a reasonable

▎ next step.

▎ A diagnostic that narrows it down in one deploy — temporarily paste this as Custom Start Command:

▎ sh -c "echo BOOT_OK && env | sort && exec python -u main.py"

▎ Three outcomes:

▎ - BOOT_OK + env dump + your normal logs → stdout buffering + a bad normal start command. diff against the working deploy.

▎ - BOOT_OK + env dump, then silent exit 0 → main.py is exiting early. look at top-of-file config checks / sys.exit calls.

▎ - No BOOT_OK at all → cached / corrupt image layer, or the entrypoint angle the mod raised. Trigger a clean rebuild (small

▎ Dockerfile edit + redeploy without cache) and reassess.


Welcome!

Sign in to your Railway account to join the conversation.

Loading...