Hey my app keeps breaking and the workers returns ERRORS or rather stop
chatelo
HOBBYOP

9 months ago

here are the logs:Starting Container

Jun 15 13:54:44

Starting application in production mode

Jun 15 13:54:44

2025-06-15 10:54:43,323 - mpesa_service - INFO - Environment variables: FLASK_ENV='production', APP_ENV='production', ENVIRONMENT='production'

Jun 15 13:54:44

2025-06-15 10:54:43,324 - mpesa_service - INFO - Using production callback URL:

API_URL/api/payments/callback

Jun 15 13:54:44

2025-06-15 10:54:43,324 - mpesa_service - INFO - M-Pesa service initialized in production mode

Jun 15 13:54:44

[2025-06-15 10:54:43 +0000] [1] [INFO] Starting gunicorn 21.2.0

Jun 15 13:54:44

[2025-06-15 10:54:43 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)

Jun 15 13:54:44

[2025-06-15 10:54:43 +0000] [1] [INFO] Using worker: sync

Jun 15 13:54:44

[2025-06-15 10:54:43 +0000] [5] [INFO] Booting worker with pid: 5

Jun 15 13:54:44

[2025-06-15 10:54:43 +0000] [6] [INFO] Booting worker with pid: 6

Jun 15 13:54:44

[2025-06-15 10:54:43 +0000] [7] [INFO] Booting worker with pid: 7

Jun 15 13:57:34

100.64.0.2 - - [15/Jun/2025:10:57:32 +0000] "GET /api/health-check HTTP/1.1" 200 44 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36"

Jun 15 13:57:34

100.64.0.2 - - [15/Jun/2025:10:57:34 +0000] "GET /api/health-check HTTP/1.1" 200 44 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36"

Jun 15 13:57:37

100.64.0.2 - - [15/Jun/2025:10:57:37 +0000] "GET /api/health HTTP/1.1" 200 62 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36"

Jun 15 13:58:27

100.64.0.3 - - [15/Jun/2025:10:58:19 +0000] "POST /api/auth/reset HTTP/1.1" 400 135 "-" "okhttp/4.9.2"

Jun 15 13:58:47

100.64.0.3 - - [15/Jun/2025:10:58:43 +0000] "POST /api/auth/reset HTTP/1.1" 400 137 "-" "okhttp/4.9.2"

Jun 15 14:04:48

[2025-06-15 11:04:48 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:7)

Jun 15 14:04:48

[2025-06-15 11:04:48 +0000] [7] [INFO] Worker exiting (pid: 7)

Jun 15 14:04:48

[2025-06-15 11:04:48 +0000] [1] [ERROR] Worker (pid:7) exited with code 1

Jun 15 14:04:48

[2025-06-15 11:04:48 +0000] [1] [ERROR] Worker (pid:7) exited with code 1.

Jun 15 14:04:48

[2025-06-15 11:04:48 +0000] [15] [INFO] Booting worker with pid: 15

Jun 15 14:06:29

[2025-06-15 11:06:22 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:6)

Jun 15 14:06:29

[2025-06-15 11:06:22 +0000] [6] [INFO] Worker exiting (pid: 6)

Jun 15 14:06:29

[2025-06-15 11:06:22 +0000] [1] [ERROR] Worker (pid:6) exited with code 1

Jun 15 14:06:29

[2025-06-15 11:06:22 +0000] [1] [ERROR] Worker (pid:6) exited with code 1.

Jun 15 14:06:29

[2025-06-15 11:06:22 +0000] [17] [INFO] Booting worker with pid: 17

Jun 15 14:07:09

[2025-06-15 11:07:07 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:5)

Jun 15 14:07:09

[2025-06-15 11:07:07 +0000] [5] [INFO] Worker exiting (pid: 5)

Jun 15 14:07:09

[2025-06-15 11:07:08 +0000] [1] [ERROR] Worker (pid:5) exited with code 1

Jun 15 14:07:09

[2025-06-15 11:07:08 +0000] [1] [ERROR] Worker (pid:5) exited with code 1.

Jun 15 14:07:09

[2025-06-15 11:07:08 +0000] [19] [INFO] Booting worker with pid: 19

Procfile configurations initial and current which both gives this error:

initial:

  • web: gunicorn wsgi:app

current:

  • web: GUNICORN_CMD_ARGS="--timeout 300 --workers 3 --max-requests 1000 --max-requests-jitter 50 --preload --worker-tmp-dir /dev/shm --log-level info --access-logfile - --error-logfile - --graceful-timeout 120 --keep-alive 2" gunicorn wsgi:app

I have also shared http methods error codes screenshot

Attachments

Solved$10 Bounty

6 Replies

chatelo
HOBBYOP

9 months ago

here is what I am currently see /api/health => Application failed to respond

This error appears to be caused by the application.

If this is your project, check out your deploy logs to see what went wrong. Refer to our docs on Fixing Common Errors for help, or reach out over our Help Station.

If you are a visitor, please contact the application owner or try again later.

Request ID:
iOe2FTrNT9aZvNw3ss7a6g


chandrika
EMPLOYEE

9 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open chandrika 9 months ago


chatelo
HOBBYOP

9 months ago

it has my api url, I treat this as sensitive as it can be messed up by bad guys that could lead to my resources being misused. This is wrong to make it public.

I have removed though.


sim
FREE

8 months ago

The logs show the issue. It is your gunicorn workers timing out.


sim
FREE

8 months ago

What are you doing? Is it very intensive or long? Can you check your graphs if there is heavy usage


sim
FREE

8 months ago

Try using uvicorn for asynchronous tasks they have an example with Mongo https://docs.railway.com/tutorials/deploy-and-monitor-mongo#2-deploy-the-python-fastapi-app


chatelo
HOBBYOP

8 months ago

I identified and fixed several key issues that were causing the worker timeouts and application instability in my cloud-hosted web application:

1. Asynchronous Processing Improvements

  • Eliminated blocking operations from the request handling path

  • Implemented timeout-based locks with fallbacks (100ms maximum wait time)

  • Moved resource-intensive tasks to background threads

  • Added graceful degradation for non-critical components

2. Database Connection Optimization

  • Adjusted connection pool parameters for optimal resource utilization

  • Implemented strategic connection timeout settings

  • Added connection health checks and pre-ping validation

  • Configured TCP keepalive parameters for better connection persistence

3. WSGI Server Configuration

  • Created a custom configuration with platform-specific optimizations(Railway)

  • Increased worker timeout threshold to 120 seconds (from default 30s)

  • Implemented resource-aware worker scaling

  • Added request-based worker recycling to prevent memory issues

  • Used shared memory for temporary files to improve performance

4. Health Check Architecture

  • Implemented a lightweight health endpoint that bypasses middleware

  • Designed tiered health check endpoints for comprehensive monitoring

  • Excluded monitoring endpoints from intensive middleware processing

5. Startup Validation

  • Added pre-startup validation for critical service dependencies

  • Implemented environment-aware startup procedures

  • Created configurable feature toggles via environment variables

The core issue was thread locking in middleware causing worker processes to hang indefinitely. By implementing non-blocking alternatives and proper timeouts, the application now maintains responsiveness even under high load conditions.


Status changed to Open chandrika 8 months ago


Status changed to Solved chandrika 8 months ago


Loading...