2 months ago
Every git push deploy to my marketing-dashboard service results in SIGTERM ~35 seconds after container start. Manual restart from the Railway dashboard always works instantly and the service runs stable indefinitely until the next git push deploy.
Repo: (main, auto-deploy)
Region: US East | Plan: Pro | Replicas: 2 | Builder: Nixpacks (Metal enabled)
What happens on every git push deploy:
1. Container starts, binds to PORT 8080 in <1 second
2. All env vars present, health endpoint responds
3. ~35 seconds later, Railway sends SIGTERM
4. Both replicas killed simultaneously (not rolling)
5. Manual restart from dashboard → works perfectly every time
Critical finding: I completely removed healthcheckPath and healthcheckTimeout from railway.toml and deployed. Same behavior — SIGTERM at ~35s. This confirms the healthcheck is not the cause.
Start command:node server.js (no npm/yarn wrapper)
Health endpoint: Simple Express route, no auth middleware, no external calls — returns JSON immediately. Registered before all authenticated routes.
What I've ruled out:
- Healthcheck failing → removed entirely, same behavior
- Healthcheck timeout → increased to 300s, same behavior
- Restart policy → ALWAYS with 10 retries
- No overlap → added 30s overlap + 15s draining
- Single replica → scaled to 2, both killed simultaneously
- Builder issue → tried Metal, same behavior
- PORT mismatch → confirmed 8080 matches domain target
- npm intercepting SIGTERM → using node server.js directly
Key questions:
1. Why does Railway send SIGTERM ~35s after start with NO healthcheck configured?
2. What differs between git push deploy and manual restart?
3. Why are both replicas killed simultaneously?
See attached file for full deploy logs, railway.toml, code snippets, and failed deploy commit hashes.
Similar thread:https://station.railway.com/questions/deployment-shows-success-but-container-s-b011f12e (same symptom, but their root cause doesn't apply — our app binds in <1s)
3 Replies
Status changed to Awaiting User Response Railway • 2 months ago
2 months ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open Railway • 2 months ago
brody
The old container is getting SIGKILLed, not the new container.
2 months ago
Here is the support evidence.
Attachments
2 months ago
I understand the old container is the one getting SIGKILLed — but I don't think that fully explains what I'm seeing.
Here's the current behavior after multiple deploys today:
- Container starts, and within 7 seconds it's stopped — no application logs at all (no "Running on port" message, no startup output). The app binds to PORT in under 1 second normally.
- This happens on fresh
deploymentRestartcalls too (via the Railway API), not just git push deploys. So there's no "old container" in that scenario — it's a clean restart. - The code passes
node --checkcleanly (no syntax errors), and the health endpoint is trivial (no auth, no dependencies). - I've now tested with and without healthcheck, with 1 and 2 replicas, with overlap/draining configured, and with different restart policies. All produce the same ~7s start-stop cycle.
The only thing that breaks the cycle is a manual restart from the Railway dashboard. After that, the exact same code runs fine for hours.
If the old container SIGKILL is expected behavior, what would cause the new container to never get a chance to run? Is there something in the deploy orchestration that could get stuck in a loop where it keeps killing containers before they can serve traffic?
Status changed to Open brody • 2 months ago