Django with Gunicorn gives frequent [CRITICAL] WORKER TIMEOUTs on simple requests

anonymousPRO

a year ago

I have been trying to deploy an app (no real users on this deploy yet). The deployment works, but some site requests that barely require any amount of back-end processing time (i.e. retrieve 1 model in the django admin without any calculated fields) will lead to Gunicorn timeouts. After a few seconds it will often work again, keep working for a while, and eventually go back to timeout's.

Suggestions are very much appreciated. I have 2 gunicorn workers and 2 replica's and I am the only user.

[2024-02-05 12:21:37 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:27)
[2024-02-05 12:21:38 +0000] [16] [ERROR] Worker (pid:27) exited with code 1
[2024-02-05 12:21:38 +0000] [16] [ERROR] Worker (pid:27) exited with code 1.
[2024-02-05 12:21:39 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:37)
[2024-02-05 12:21:39 +0000] [16] [ERROR] Worker (pid:37) exited with code 1
[2024-02-05 12:21:39 +0000] [16] [ERROR] Worker (pid:37) exited with code 1.

I am running it using docker with the following dockerfile

FROM python:3.11-slim
RUN pip install --upgrade pip
COPY ./requirements.txt .
RUN pip install -r requirements.txt
COPY . /app
WORKDIR /app
COPY docker_entrypoint.sh .
ENTRYPOINT ["sh", "/app/docker_entrypoint.sh"]

Where the entrypoint will do migrations and eventually run

    PYTHONPATH=`pwd`/project gunicorn project.wsgi.wsgi_production:application --timeout 60 --workers 2  --access-logfile - --log-level WARNING

0 Replies

anonymousPRO

a year ago

45fb0a1b-efd2-4699-81c8-16a885d8c33c


anonymousPRO

a year ago

Update: I now also have requests hanging for 5 minutes before a definitive timeout.

[2024-02-05 14:56:50 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:33)
[2024-02-05 14:56:50 +0000] [16] [ERROR] Worker (pid:33) exited with code 1
[2024-02-05 14:56:50 +0000] [16] [ERROR] Worker (pid:33) exited with code 1.
[2024-02-05 14:57:52 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:35)
[2024-02-05 14:57:52 +0000] [16] [ERROR] Worker (pid:35) exited with code 1
[2024-02-05 14:57:52 +0000] [16] [ERROR] Worker (pid:35) exited with code 1.
[2024-02-05 14:58:53 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:37)
[2024-02-05 14:58:53 +0000] [16] [ERROR] Worker (pid:37) exited with code 1
[2024-02-05 14:58:53 +0000] [16] [ERROR] Worker (pid:37) exited with code 1.
[2024-02-05 14:59:54 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:39)
[2024-02-05 14:59:55 +0000] [16] [ERROR] Worker (pid:39) exited with code 1
[2024-02-05 14:59:55 +0000] [16] [ERROR] Worker (pid:39) exited with code 1.

a year ago

had another user with the exact same problem on the same tech stack, their issue turned out to be incorrect database credentials, the database connection was hanging up then silently failing blocking all requests while doing so


anonymousPRO

a year ago

I am using DEFAULT_DB_URL=${{Postgres.DATABASE_PRIVATE_URL}}, and this can't explain why it does work sometimes right?


a year ago

is that the environment variable you are using in code? because unless you are using a url database module django only accepts separate database credentials


a year ago

show me the database stuff in your settings.py please


anonymousPRO

a year ago

I am using django-environ

DATABASES = {
    "default": env.db("DEFAULT_DB_URL"),
}

migrating the database, loading the fixtures, etc. works (I have a sleep 2 on startup to ensure database connection is ready, as recommended in another post).
And I can see the data in my admin panel if it does not time out.


a year ago

sleep 3 is recommended, 2 seconds is pushing it because the max time for readiness does tend to exceed 2 seconds


anonymousPRO

a year ago

I increased it to 5 to be on the safe side, but this won't resolve my timeouts 😅


a year ago

theres some other piece of code somewhere that's blocking, I'd recommend adding verbose debug logging to find out at what point your app is locking up


anonymousPRO

a year ago

Ok, will do that. Do you mean in Gunicorn or in Django or in my postgres service (or all of them) 🤔 ?


a year ago

in django


anonymousPRO

a year ago

I added debug logging, but I don't see anything in my logs when this happens.

[2024-02-08 18:58:05 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:17)
[2024-02-08 19:58:05 +0100] [17] [INFO] Worker exiting (pid: 17)
[2024-02-08 18:58:06 +0000] [16] [ERROR] Worker (pid:17) exited with code 1
[2024-02-08 18:58:06 +0000] [16] [ERROR] Worker (pid:17) exited with code 1.
[2024-02-08 18:58:06 +0000] [19] [INFO] Booting worker with pid: 19
[2024-02-08 18:59:09 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:19)
[2024-02-08 19:59:09 +0100] [19] [INFO] Worker exiting (pid: 19)
[2024-02-08 18:59:09 +0000] [16] [ERROR] Worker (pid:19) exited with code 1
[2024-02-08 18:59:09 +0000] [16] [ERROR] Worker (pid:19) exited with code 1.
[2024-02-08 18:59:09 +0000] [21] [INFO] Booting worker with pid: 21
[2024-02-08 19:00:11 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:21)
[2024-02-08 20:00:11 +0100] [21] [INFO] Worker exiting (pid: 21)
[2024-02-08 19:00:11 +0000] [16] [ERROR] Worker (pid:21) exited with code 1
[2024-02-08 19:00:11 +0000] [16] [ERROR] Worker (pid:21) exited with code 1.
[2024-02-08 19:00:11 +0000] [23] [INFO] Booting worker with pid: 23
[2024-02-08 19:01:12 +0000] [16] [ERROR] Worker (pid:23) exited with code 1.
[2024-02-08 19:01:12 +0000] [25] [INFO] Booting worker with pid: 25
[2024-02-08 20:01:13 +0100] [25] [DEBUG] GET /super-admin/campaigns/campaign/

The get request only shows after it keeps hanging for a while


a year ago

something in your code is freezing and causing the request to take longer than 30 seconds


a year ago

unless you have something that should take longer than 30 seconds?


anonymousPRO

a year ago

It happens on arbitrary requests that do not take any significant amount of time (should not be even close to a second), and it does not happen locally nor on my PythonAnywhere hosted (test)server (that does not deploy with Docker) 🤔 .


a year ago

railway runs your code as is, pythonanywhere is likely monkeypatching away some bugs in your code


anonymousPRO

a year ago

I already put the gunicorn timeout on 60 to test if it eventually would finish (which it does not).


a year ago

without an error or any logs to go off of theres not much i can help you with here unfortunately


midhun98HOBBY

a year ago

did u find any fix to this issue since im also facing a similar issue


anonymousPRO

a year ago

I did not, based on my logging it seems to be hanging on loading static files, which I am serving through whitenoise (6.6.0)

INSTALLED_APPS = [    ...    'whitenoise.runserver_nostatic',    "django.contrib.staticfiles",    ...]

MIDDLEWARE = [    "django.middleware.security.SecurityMiddleware",    "whitenoise.middleware.WhiteNoiseMiddleware",    ...]

STORAGES = {
    "default": {
        "BACKEND": "storages.backends.gcloud.GoogleCloudStorage",
    },
    "staticfiles": {
        "BACKEND": "whitenoise.storage.CompressedManifestStaticFilesStorage",
    },
}

a year ago

please reference the docs for whitenoise on how to properly configure it


anonymousPRO

a year ago

@MIGHTY_MIDHUN I solved it eventually by moving my static files to a Google Cloud Storage bucket.