Python Backend hangs indefinitely (loading spinner) until manual redeploy
shahin-behzadrad
PROOP

a month ago

Hi,

I am facing a critical issue with my production app where the backend becomes unresponsive after a period of time, requiring a manual redeploy to fix.

The Stack:

  • Frontend: Next.js

  • Backend: Python

  • Database: Postgres

The Symptoms:

  • The app (summachat.com) works fine after a fresh deploy.

  • After some time (hours/days), the frontend gets stuck on a loading state.

  • The backend service shows as "Online" in the Railway dashboard (no crash reported).

  • The Fix: If I manually redeploy the exact same backend commit, the app immediately starts working again.

Project ID:5f090606-c2e6-456d-b71e-9fd698cf176b,

Could someone please check the service metrics/health to see why it hangs without crashing?

Thanks!

$10 Bounty

6 Replies

Railway
BOT

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway 29 days ago


Status changed to Solved shahin-behzadrad 29 days ago


shahin-behzadrad
PROOP

a month ago

Hi,

I am facing a critical issue with my production app where the backend becomes unresponsive after a period of time, requiring a manual redeploy to fix.

The Stack:

  • Frontend: Next.js

  • Backend: Python

  • Database: Postgres

The Symptoms:

  • The app (summachat.com) works fine after a fresh deploy.

  • After some time (hours/days), the frontend gets stuck on a loading state.

  • The backend service shows as "Online" in the Railway dashboard (no crash reported).

  • The Fix: If I manually redeploy the exact same backend commit, the app immediately starts working again.

Project ID:5f090606-c2e6-456d-b71e-9fd698cf176b,

Could someone please check the service metrics/health to see why it hangs without crashing?

Thanks!


Status changed to Open ray-chen 29 days ago


Are you using serverless?


0x5b62656e5d

Are you using serverless?

shahin-behzadrad
PROOP

a month ago

No, the app runs as containerized services on Railway (Next.js frontend + FastAPI backend). They’re long-running processes, not serverless functions.


Not a serverless function.

I meant this:

Attachments


0x5b62656e5d

Not a serverless function.I meant this:

shahin-behzadrad
PROOP

a month ago

no its not serverless

Attachments


shahin-behzadrad
PROOP

a month ago

Thank you so much for the detailed breakdown, this was incredibly helpful!

I've implemented all of your suggestions:

  1. Health check — Updated my /health endpoint to actually test DB connectivity with SELECT 1 instead of just returning {"status": "ok"}. Configured it as the healthcheck path in Railway.

  1. Gunicorn with async workers — Switched from raw uvicorn to gunicorn -w 4 -k uvicorn.workers.UvicornWorker --timeout 90. The worker timeout alone should prevent the silent hang, if a worker gets stuck, gunicorn will kill and restart it automatically.

  1. Connection pool tuning — Reduced from pool_size=50 + max_overflow=50 (100 total) down to pool_size=5 + max_overflow=10 per worker. With 4 gunicorn workers, the old config could have opened up to 400 connections against Railway Postgres — almost certainly the root cause of the exhaustion.

  1. Pool recycle — Reduced from 30 minutes to 5 minutes since Railway Postgres can drop idle connections sooner.

Deploying to staging first to validate, then rolling out to production. Really appreciate the thorough response, saved me a lot of debugging time!


Status changed to Open brody 27 days ago


Loading...