Web service intermittently hangs on POST requests — GET fine, POST connection closed after 15s
unix0899
PROOP

24 days ago

Hi Railway team,

Our Django web service has been intermittently broken since ~07:00 UTC today (2026-05-20).

Symptoms

  • GET requests on public pages: usually fast (40–250 ms, occasionally times out at 15 s).

  • POST requests on the login endpoint (/auth/login/client/): consistently hang

    until Railway edge closes the connection at exactly 15 s, then return

    "connection closed". Subsequent attempts time out at 30–60 s.

  • This started after a routine git push deploy, not after any infra change

    on our side. A force-redeploy (empty commit) did not resolve it.

The 15 s cutoff matches Railway's edge proxy timeout, which suggests our

gunicorn workers are blocked on something (likely a backing service) rather

than crashing with a stack trace.

What we already verified on our side

  • Django boots, serves the login page, CSRF tokens are issued normally.

  • DB queries on GET work (page render includes DB-backed content).

  • We added aggressive Redis SOCKET_CONNECT_TIMEOUT/SOCKET_TIMEOUT (2 s)

    with IGNORE_EXCEPTIONS: True so the cache fails fast on Redis trouble.

    Latest commit deployed: Behavior unchanged after deploy.

Strong hypothesis

The POST path is the first place our code touches Redis (rate-limit check

via django.core.cache). The fact that GET works while POST hangs 15 s+

fits a Redis service that accepts TCP connections but never responds. Could

you please check:

  1. Health of the Redis service attached to our project (memory, CPU,

    connection saturation, restarts).

  2. Whether there is any networking incident between our web service and the

    Redis service in our project's region.

  3. The current state of our latest deployment (commit 471b810).

Project info

Solved

1 Replies

Railway
BOT

24 days ago

Your service's symptoms align with a major platform-wide service disruption that ran from 02:25 UTC to 07:57 UTC today, affecting traffic and networking across all regions. Your logs confirm that gunicorn workers are crashing because the Redis client cannot establish a TCP connection, which matches the internal networking disruption during that window. The incident is now resolved. If your service is still experiencing Redis connection failures, redeploying both the Redis service and the web service (in that order) should re-establish connectivity, as the Redis service hasn't been redeployed since March and may have stale network state from the disruption.


Status changed to Awaiting User Response Railway 24 days ago


Status changed to Solved unix0899 24 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...