5 months ago
Our Flask application container starts successfully, passes healthcheck, but terminates after 3-6 seconds without errors.
Timeline:
- Dec 21, 2025 (commit 7da1c39e): Deployment worked perfectly, received webhooks
- Dec 26+: Same code fails - container stops after healthcheck
Current behavior:
Starting Container
Flask app starts → Healthcheck passes (200 OK) → Container stops 3-6 sec laterLogs:
[2026-01-01 14:36:26] >>> HEALTHCHECK called
[2026-01-01 14:36:26] "GET /health HTTP/1.1" 200 -
[2026-01-01 14:36:26] Flask running on 0.0.0.0:5000
[2026-01-01 14:36:29] Stopping ContainerConfiguration:
- Runtime: Python 3.12
- start.py: Flask app on port 5000 (threading.Thread)
- Procfile: web: python start.py
- Healthcheck Path: (tried both /health and empty)
- Restart Policy: On Failure
- No errors in logs
What we tried:
- Reverted to working commit (7da1c39e) - still fails
- Changed ports (5000 → 8080 → 5000)
- Switched Gunicorn ↔ Flask dev server
- Disabled/enabled healthcheck
- Changed restart policies
Question: Did Railway change container lifecycle behavior between Dec 21-26? Exact same code/config that worked on Dec 21 now terminates.
11 Replies
5 months ago
## UPDATE: Problem is Railway-specific, works on Render
We've deployed the exact same code (same commit, same config) to Render.com and it works perfectly - container stays alive, webhooks respond.
Render logs:
```
==> Your service is live
Available at https://tourbot-94ko.onrender.com
Detected service running on port 10000
```
Railway logs (same code):
```
Starting Container
Flask starts → Healthcheck 200 OK → 3 sec later → Stopping Container
```
This confirms the issue is Railway platform-specific, not our code.
Question for Railway team:
- Did something change in container lifecycle between Dec 21-26?
- Why does Railway terminate healthy containers after successful healthcheck?
- Should we migrate to Render or is there a fix?
We prefer Railway, but need stable deployments.
5 months ago
the issue is likely that you're running flask in a threading.Thread but your main thread exits immediately after starting it. when the main thread finishes, the container has no foreground process and railway terminates it. this works on render because different platforms handle orphaned threads differently.
check your start.py: if you're doing something like thread = threading.Thread(target=app.run); thread.start() without blocking the main thread afterward, that's the problem. the fix is either run flask directly in the main thread app.run(host='0.0.0.0', port=int(os.environ.get('PORT', 5000)))) or add thread.join() after starting it to keep main alive. if you need background workers alongside flask, run flask in main and put your workers in daemon threads.
for production you should also switch from flask dev server to gunicorn: gunicorn --bind 0.0.0.0:$PORT app:app. the dev server isnt meant for production and has quirks with threading. this would also explain why it randomly broke since railway may have tightened their process detection recently.
yusufmo1
the issue is likely that you're running flask in a `threading.Thread` but your main thread exits immediately after starting it. when the main thread finishes, the container has no foreground process and railway terminates it. this works on render because different platforms handle orphaned threads differently. check your [start.py](http://start.py): if you're doing something like `thread = threading.Thread(target=app.run); thread.start()` without blocking the main thread afterward, that's the problem. the fix is either run flask directly in the main thread `app.run(host='0.0.0.0', port=int(os.environ.get('PORT', 5000)))`) or add `thread.join()` after starting it to keep main alive. if you need background workers alongside flask, run flask in main and put your workers in daemon threads. for production you should also switch from flask dev server to gunicorn: `gunicorn --bind 0.0.0.0:$PORT app:app`. the dev server isnt meant for production and has quirks with threading. this would also explain why it randomly broke since railway may have tightened their process detection recently.
5 months ago
Thanks for the insight! We followed your advice:
Changes made:
1. Switched Flask from daemon thread to main thread in start.py
2. Replaced Flask dev server with Gunicorn: gunicorn --bind 0.0.0.0:$PORT --workers 2 --timeout 120 tribute.webhook:app
3. Verified main thread doesn't exit (using app.run() directly)
Result:
Container still terminates after successful healthcheck:
```
[2026-01-04 10:34:20] [INFO] Starting gunicorn 21.2.0
[2026-01-04 10:34:20] [INFO] Listening at: http://0.0.0.0:8080
[2026-01-04 10:34:20] [INFO] Booting worker with pid: 2
>>> HEALTHCHECK вызван (200 OK)
Stopping Container ← 3 seconds later
```
Comparison:
- Same code on Render.com = works perfectly, stays alive
- Same code on Railway = terminates after healthcheck
Timeline:
- Dec 21: This exact code worked on Railway
- Dec 26+: Same deployments now fail
No error logs, no exceptions - just clean termination after successful healthcheck.
Any other ideas what Railway might have changed in container lifecycle? Or is this a known platform issue?
5 months ago
few more things to check since the threading fix didn't work:
1. go to your service Settings and look for "Cron Schedule" field. if there's any value there, your service is being treated as a cron job (runs then terminates). clear it completely. also check restart policy is set to "On Failure" or "Always", not "Never"
2. check the metrics tab for memory usage right before termination. if it spikes to your plan limit, railway kills the container silently (no error logs). gunicorn with 2 workers can be memory heavy on startup
3. one weird thing: the "Stopping Container" message after healthcheck sometimes refers to railway's internal healthcheck probe container shutting down, not your actual app. check if your public url actually works for a few seconds before it dies, or if it never responds at all
4. try removing the healthcheck path entirely in settings (leave it blank) and redeploy. if the container stays alive without healthcheck configured, then railway's healthcheck probe is somehow killing your service
im pretty unsure though - can you share more logs?
yusufmo1
few more things to check since the threading fix didn't work: 1\. go to your service Settings and look for "Cron Schedule" field. if there's any value there, your service is being treated as a cron job (runs then terminates). clear it completely. also check restart policy is set to "On Failure" or "Always", not "Never" 2\. check the metrics tab for memory usage right before termination. if it spikes to your plan limit, railway kills the container silently (no error logs). gunicorn with 2 workers can be memory heavy on startup 3\. one weird thing: the "Stopping Container" message after healthcheck sometimes refers to railway's internal healthcheck probe container shutting down, not your actual app. check if your public url actually works for a few seconds before it dies, or if it never responds at all 4\. try removing the healthcheck path entirely in settings (leave it blank) and redeploy. if the container stays alive without healthcheck configured, then railway's healthcheck probe is somehow killing your service im pretty unsure though - can you share more logs?
5 months ago
There is no scenario where a health check would kill the container, nor does the Stopping Container message have anything to do with the health check.
We appreciate the help you are providing but please make sure to fact check your messages before sending.
yusufmo1
few more things to check since the threading fix didn't work: 1\. go to your service Settings and look for "Cron Schedule" field. if there's any value there, your service is being treated as a cron job (runs then terminates). clear it completely. also check restart policy is set to "On Failure" or "Always", not "Never" 2\. check the metrics tab for memory usage right before termination. if it spikes to your plan limit, railway kills the container silently (no error logs). gunicorn with 2 workers can be memory heavy on startup 3\. one weird thing: the "Stopping Container" message after healthcheck sometimes refers to railway's internal healthcheck probe container shutting down, not your actual app. check if your public url actually works for a few seconds before it dies, or if it never responds at all 4\. try removing the healthcheck path entirely in settings (leave it blank) and redeploy. if the container stays alive without healthcheck configured, then railway's healthcheck probe is somehow killing your service im pretty unsure though - can you share more logs?
5 months ago
Either way, OP is running a development server, they are not using gunicorn as they have mentioned they switched to.
brody
Either way, OP is running a development server, they are not using gunicorn as they have mentioned they switched to.
5 months ago
@brody We DID use Gunicorn. Here are the logs from commit 4bcbfd08 (Jan 4, 10:34 UTC) with Gunicorn on Railway staging:
```
[2026-01-04 10:34:20] [INFO] Starting gunicorn 21.2.0
[2026-01-04 10:34:20] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2026-01-04 10:34:20] [INFO] Using worker: sync
[2026-01-04 10:34:20] [INFO] Booting worker with pid: 2
[2026-01-04 10:34:20] [INFO] Booting worker with pid: 3
[... Flask app initialization ...]
>>> HEALTHCHECK вызван в 2026-01-04 10:34:20.603609
Stopping Container ← 3 seconds after startup
```
Procfile at that commit:
```
web: gunicorn --bind 0.0.0.0:$PORT --workers 2 --timeout 120 tribute.webhook:app
```
Same exact code on Render.com:
- Gunicorn starts
- Healthcheck passes
- Container stays alive
- Service works perfectly
The python start.py logs you saw were from our LATEST attempt AFTER Gunicorn failed multiple times.
Is there a Railway platform change between Dec 21-26 that could cause this?
5 months ago
UPDATE 2: Isolated services, same issue
Following suggestions to separate concerns, I've split the application into two independent services in the staging project:
- tourbot - Pure Python bot process (
python manager.py)- No Flask/HTTP server
- Removed healthcheck (bots don't need it)
- tribute-webhook - Standalone Flask webhook (
gunicorn tribute.webhook:app)- Healthcheck endpoint:
/health - Returns 200 OK successfully
- Healthcheck endpoint:
Result: Both services terminate 3-5 seconds after starting with "Stopping Container" in logs.
5 months ago
Dear Diary,
Update on our debugging saga. Turns out we've been chasing ghosts.
What happened:
We saw "Stopping Container" in logs and assumed our services were broken. Spent days trying different configs, isolating services, minimal setups - you name it.
The actual situation:
- "Stopping Container" shows up in logs
- Status page: Online
- Healthcheck: responds fine
- Services: processing requests normally
The message looks like something failed, but the service keeps running.
What really confused us:
Our staging project kept showing "Stopping Container" right after deploy, so we thought containers were dying. Tried everything to fix it. Meanwhile production project with identical code worked fine.
Turns out staging containers were probably working too - we just saw the restart message and panicked.
5 months ago
Where you looking at the logs of the old deployments? we stop old deployments when you create a new deployment.
brody
Where you looking at the logs of the old deployments? we stop old deployments when you create a new deployment.
5 months ago
I was looking at the general Logs tab (time-based view, not filtered by specific deployment). You can also filter by deployment there, including ones with "removed" status.