Our API is going down with 499s without any other logs
pedropregueiro
PROOP

a month ago

Our FastAPI application becomes completely unresponsive multiple times per day. The process hangs silently - no error logs, no exceptions, no crash messages. The healthcheck endpoint stops responding, Railway returns 499 errors (client closed request - proxy timed out waiting for response), and manual restart is required to recover.

This has happened at least 4 times today:

- ~01:00 UTC (30 min outage)

- ~19:00 UTC

- ~22:05 UTC

- ~22:33 UTC

From application logs: Nothing - complete silence. Last log entry is a successful healthcheck, then no logs until manual restart.

Logs - Example timeline (22:33 UTC incident):

2026-01-27T22:31:04.471Z [INFO] event="GET /healthcheck" latency_ms=1.53 status_code=200

2026-01-27T22:32:04.855Z [INFO] event="GET /healthcheck" latency_ms=1.02 status_code=200

2026-01-27T22:33:05.532Z [INFO] event="GET /healthcheck" latency_ms=1.36 status_code=200

<-- NO MORE LOGS - APP HUNG -->

No error messages, no exceptions - the process is alive but completely unresponsive.

Stack

- Framework: FastAPI + Uvicorn

- Database: Supabase PostgreSQL (direct connection port 5432)

- Background jobs: Celery + Redis

- Python: 3.13

We checked the DB and that doesn't seem to be responsive.

Any idea what this might be?

Solved$30 Bounty

Pinned Solution

Seen this before. Usually not Railway. FastAPI is likely getting stuck in a blocking call DB, Redis, Celery. When the event loop blocks, the process stays alive, no errors, no logs, healthcheck just hangs and Railway returns 499. Common fixes like add timeouts to Postgres, Redis. Make sure DB driver async. Dont run sungle uvicorn worker. Restart works because it clears the stuck connection. Also Python 3.13 can also make this worse

2 Replies

Railway
BOT

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway about 1 month ago


Seen this before. Usually not Railway. FastAPI is likely getting stuck in a blocking call DB, Redis, Celery. When the event loop blocks, the process stays alive, no errors, no logs, healthcheck just hangs and Railway returns 499. Common fixes like add timeouts to Postgres, Redis. Make sure DB driver async. Dont run sungle uvicorn worker. Restart works because it clears the stuck connection. Also Python 3.13 can also make this worse


pedropregueiro
PROOP

a month ago

thanks! it was indeed a blocking request sorta blocking the whole thing


Status changed to Open brody about 1 month ago


Status changed to Solved brody about 1 month ago


Loading...