Our Flask application container starts successfully, passes healthcheck, but terminates after 3-6 seconds without errors. **Timeline:** * Dec 21, 2025 (commit 7da1c39e): Deployment worked perfectly, received webhooks * Dec 26+: Same code fails - container stops after healthcheck **Current behavior:** ``` Starting Container Flask app starts → Healthcheck passes (200 OK) → Container stops 3-6 sec later ``` **Logs:** ``` [2026-01-01 14:36:26] >>> HEALTHCHECK called [2026-01-01 14:36:26] "GET /health HTTP/1.1" 200 - [2026-01-01 14:36:26] Flask running on 0.0.0.0:5000 [2026-01-01 14:36:29] Stopping Container ``` **Configuration:** * Runtime: Python 3.12 * [start.py](http://start.py): Flask app on port 5000 (threading.Thread) * Procfile: _web: python_ [_start.py_](http://start.py) * Healthcheck Path: (tried both _/health_ and empty) * Restart Policy: On Failure * No errors in logs **What we tried:** 1. Reverted to working commit (7da1c39e) - still fails 2. Changed ports (5000 → 8080 → 5000) 3. Switched Gunicorn ↔ Flask dev server 4. Disabled/enabled healthcheck 5. Changed restart policies **Question:** Did Railway change container lifecycle behavior between Dec 21-26? Exact same code/config that worked on Dec 21 now terminates.

Container terminates after successful healthcheck (worked on Dec 21, failing since Dec 26)

natalykouz

HOBBYOP

7 months ago

Our Flask application container starts successfully, passes healthcheck, but terminates after 3-6 seconds without errors.

Timeline:

Dec 21, 2025 (commit 7da1c39e): Deployment worked perfectly, received webhooks
Dec 26+: Same code fails - container stops after healthcheck

Current behavior:

Starting Container
Flask app starts → Healthcheck passes (200 OK) → Container stops 3-6 sec later

Logs:

[2026-01-01 14:36:26] >>> HEALTHCHECK called
[2026-01-01 14:36:26] "GET /health HTTP/1.1" 200 -
[2026-01-01 14:36:26] Flask running on 0.0.0.0:5000
[2026-01-01 14:36:29] Stopping Container

Configuration:

Runtime: Python 3.12
start.py: Flask app on port 5000 (threading.Thread)
Procfile: web: python start.py
Healthcheck Path: (tried both /health and empty)
Restart Policy: On Failure
No errors in logs

What we tried:

Reverted to working commit (7da1c39e) - still fails
Changed ports (5000 → 8080 → 5000)
Switched Gunicorn ↔ Flask dev server
Disabled/enabled healthcheck
Changed restart policies

Question: Did Railway change container lifecycle behavior between Dec 21-26? Exact same code/config that worked on Dec 21 now terminates.

$10 Bounty

11 Replies

natalykouz

HOBBYOP

7 months ago

## UPDATE: Problem is Railway-specific, works on Render

We've deployed the exact same code (same commit, same config) to Render.com and it works perfectly - container stays alive, webhooks respond.

Render logs:

```

==> Your service is live

Available at https://tourbot-94ko.onrender.com

Detected service running on port 10000

```

Railway logs (same code):

```

Starting Container

Flask starts → Healthcheck 200 OK → 3 sec later → Stopping Container

```

This confirms the issue is Railway platform-specific, not our code.

Question for Railway team:

- Did something change in container lifecycle between Dec 21-26?

- Why does Railway terminate healthy containers after successful healthcheck?

- Should we migrate to Render or is there a fix?

We prefer Railway, but need stable deployments.

yusufmo1

PRO

7 months ago

the issue is likely that you're running flask in a threading.Thread but your main thread exits immediately after starting it. when the main thread finishes, the container has no foreground process and railway terminates it. this works on render because different platforms handle orphaned threads differently.

check your start.py: if you're doing something like thread = threading.Thread(target=app.run); thread.start() without blocking the main thread afterward, that's the problem. the fix is either run flask directly in the main thread app.run(host='0.0.0.0', port=int(os.environ.get('PORT', 5000)))) or add thread.join() after starting it to keep main alive. if you need background workers alongside flask, run flask in main and put your workers in daemon threads.

for production you should also switch from flask dev server to gunicorn: gunicorn --bind 0.0.0.0:$PORT app:app. the dev server isnt meant for production and has quirks with threading. this would also explain why it randomly broke since railway may have tightened their process detection recently.

yusufmo1

the issue is likely that you're running flask in a `threading.Thread` but your main thread exits immediately after starting it. when the main thread finishes, the container has no foreground process and railway terminates it. this works on render because different platforms handle orphaned threads differently. check your [start.py](http://start.py): if you're doing something like `thread = threading.Thread(target=app.run); thread.start()` without blocking the main thread afterward, that's the problem. the fix is either run flask directly in the main thread `app.run(host='0.0.0.0', port=int(os.environ.get('PORT', 5000)))`) or add `thread.join()` after starting it to keep main alive. if you need background workers alongside flask, run flask in main and put your workers in daemon threads. for production you should also switch from flask dev server to gunicorn: `gunicorn --bind 0.0.0.0:$PORT app:app`. the dev server isnt meant for production and has quirks with threading. this would also explain why it randomly broke since railway may have tightened their process detection recently.

natalykouz

HOBBYOP

7 months ago

Thanks for the insight! We followed your advice:

Changes made:

1. Switched Flask from daemon thread to main thread in start.py

2. Replaced Flask dev server with Gunicorn: gunicorn --bind 0.0.0.0:$PORT --workers 2 --timeout 120 tribute.webhook:app

3. Verified main thread doesn't exit (using app.run() directly)

Result:

Container still terminates after successful healthcheck:

```

[2026-01-04 10:34:20] [INFO] Starting gunicorn 21.2.0

[2026-01-04 10:34:20] [INFO] Listening at: http://0.0.0.0:8080

[2026-01-04 10:34:20] [INFO] Booting worker with pid: 2

>>> HEALTHCHECK вызван (200 OK)

Stopping Container ← 3 seconds later

```

Comparison:

- Same code on Render.com = works perfectly, stays alive

- Same code on Railway = terminates after healthcheck

Timeline:

- Dec 21: This exact code worked on Railway

- Dec 26+: Same deployments now fail

No error logs, no exceptions - just clean termination after successful healthcheck.

Any other ideas what Railway might have changed in container lifecycle? Or is this a known platform issue?

yusufmo1

PRO

7 months ago

few more things to check since the threading fix didn't work:

1. go to your service Settings and look for "Cron Schedule" field. if there's any value there, your service is being treated as a cron job (runs then terminates). clear it completely. also check restart policy is set to "On Failure" or "Always", not "Never"

2. check the metrics tab for memory usage right before termination. if it spikes to your plan limit, railway kills the container silently (no error logs). gunicorn with 2 workers can be memory heavy on startup

3. one weird thing: the "Stopping Container" message after healthcheck sometimes refers to railway's internal healthcheck probe container shutting down, not your actual app. check if your public url actually works for a few seconds before it dies, or if it never responds at all

4. try removing the healthcheck path entirely in settings (leave it blank) and redeploy. if the container stays alive without healthcheck configured, then railway's healthcheck probe is somehow killing your service

im pretty unsure though - can you share more logs?

yusufmo1

few more things to check since the threading fix didn't work: 1\. go to your service Settings and look for "Cron Schedule" field. if there's any value there, your service is being treated as a cron job (runs then terminates). clear it completely. also check restart policy is set to "On Failure" or "Always", not "Never" 2\. check the metrics tab for memory usage right before termination. if it spikes to your plan limit, railway kills the container silently (no error logs). gunicorn with 2 workers can be memory heavy on startup 3\. one weird thing: the "Stopping Container" message after healthcheck sometimes refers to railway's internal healthcheck probe container shutting down, not your actual app. check if your public url actually works for a few seconds before it dies, or if it never responds at all 4\. try removing the healthcheck path entirely in settings (leave it blank) and redeploy. if the container stays alive without healthcheck configured, then railway's healthcheck probe is somehow killing your service im pretty unsure though - can you share more logs?

brody

EMPLOYEE

7 months ago

There is no scenario where a health check would kill the container, nor does the Stopping Container message have anything to do with the health check.

We appreciate the help you are providing but please make sure to fact check your messages before sending.

yusufmo1

brody

EMPLOYEE

7 months ago

Either way, OP is running a development server, they are not using gunicorn as they have mentioned they switched to.

brody

Either way, OP is running a development server, they are not using gunicorn as they have mentioned they switched to.

natalykouz

HOBBYOP

7 months ago

@brody We DID use Gunicorn. Here are the logs from commit 4bcbfd08 (Jan 4, 10:34 UTC) with Gunicorn on Railway staging:

```

[2026-01-04 10:34:20] [INFO] Starting gunicorn 21.2.0

[2026-01-04 10:34:20] [INFO] Listening at: http://0.0.0.0:8080 (1)

[2026-01-04 10:34:20] [INFO] Using worker: sync

[2026-01-04 10:34:20] [INFO] Booting worker with pid: 2

[2026-01-04 10:34:20] [INFO] Booting worker with pid: 3

[... Flask app initialization ...]

>>> HEALTHCHECK вызван в 2026-01-04 10:34:20.603609

Stopping Container ← 3 seconds after startup

```

Procfile at that commit:

```

web: gunicorn --bind 0.0.0.0:$PORT --workers 2 --timeout 120 tribute.webhook:app

```

Same exact code on Render.com:

- Gunicorn starts

- Healthcheck passes

- Container stays alive

- Service works perfectly

The python start.py logs you saw were from our LATEST attempt AFTER Gunicorn failed multiple times.

Is there a Railway platform change between Dec 21-26 that could cause this?

natalykouz

HOBBYOP

7 months ago

UPDATE 2: Isolated services, same issue

Following suggestions to separate concerns, I've split the application into two independent services in the staging project:

tourbot - Pure Python bot process (python manager.py)
- No Flask/HTTP server
- Removed healthcheck (bots don't need it)
tribute-webhook - Standalone Flask webhook (gunicorn tribute.webhook:app)
- Healthcheck endpoint: /health
- Returns 200 OK successfully

Result: Both services terminate 3-5 seconds after starting with "Stopping Container" in logs.

natalykouz

HOBBYOP

7 months ago

Dear Diary,

Update on our debugging saga. Turns out we've been chasing ghosts.

What happened:

We saw "Stopping Container" in logs and assumed our services were broken. Spent days trying different configs, isolating services, minimal setups - you name it.

The actual situation:

- "Stopping Container" shows up in logs

- Status page: Online

- Healthcheck: responds fine

- Services: processing requests normally

The message looks like something failed, but the service keeps running.

What really confused us:

Our staging project kept showing "Stopping Container" right after deploy, so we thought containers were dying. Tried everything to fix it. Meanwhile production project with identical code worked fine.

Turns out staging containers were probably working too - we just saw the restart message and panicked.

brody

EMPLOYEE

7 months ago

Where you looking at the logs of the old deployments? we stop old deployments when you create a new deployment.

brody

Where you looking at the logs of the old deployments? we stop old deployments when you create a new deployment.

natalykouz

HOBBYOP

7 months ago

I was looking at the general Logs tab (time-based view, not filtered by specific deployment). You can also filter by deployment there, including ones with "removed" status.

Welcome!