Our API is going down with 499s without any other logs

Question

Our FastAPI application becomes completely unresponsive multiple times per day. The process hangs silently - no error logs, no exceptions, no crash messages. The healthcheck endpoint stops responding, Railway returns 499 errors (client closed request - proxy timed out waiting for response), and manual restart is required to recover.

This has happened at least 4 times today:

\- \~01:00 UTC (30 min outage)

\- \~19:00 UTC

\- \~22:05 UTC

\- \~22:33 UTC

From application logs: Nothing - complete silence. Last log entry is a successful healthcheck, then no logs until manual restart.

Logs - Example timeline (22:33 UTC incident):

2026-01-27T22:31:04.471Z $$INFO$$ event="GET /healthcheck" latency\_ms=1.53 status\_code=200

2026-01-27T22:32:04.855Z $$INFO$$ event="GET /healthcheck" latency\_ms=1.02 status\_code=200

2026-01-27T22:33:05.532Z $$INFO$$ event="GET /healthcheck" latency\_ms=1.36 status\_code=200

<-- NO MORE LOGS - APP HUNG -->

No error messages, no exceptions - the process is alive but completely unresponsive.

Stack

\- Framework: FastAPI + Uvicorn

\- Database: Supabase PostgreSQL (direct connection port 5432)

\- Background jobs: Celery + Redis

\- Python: 3.13

We checked the DB and that doesn't seem to be responsive.

Any idea what this might be?

bigdaddyaman · Accepted Answer

Seen this before. Usually not Railway. FastAPI is likely getting stuck in a blocking call DB, Redis, Celery. When the event loop blocks, the process stays alive, no errors, no logs, healthcheck just hangs and Railway returns 499\. Common fixes like add timeouts to Postgres, Redis. Make sure DB driver async. Dont run sungle uvicorn worker. Restart works because it clears the stuck connection. Also Python 3.13 can also make this worse