a year ago
My FastAPI application processes requests in 12-13 seconds (confirmed in logs), but clients experience 60-90 second response times when calling the API.
Issue details:
- App logs show inference completes in ~12s consistently
- Client requests.post() takes 60-90s total, sporadically it'll be ~ app process time (~12s)
- Only one request appears in logs (no evidence of retries)
- Issue occurs on $5/month plan (8 vCPU, 8GB RAM)
- Same code works instantly locally
Application: FastAPI with CLIP model inference
Endpoint: POST /get_embedding (returns JSON embedding)
Response size: ~2-8KB JSON
This appears to be a Railway infrastructure delay between my container responding and the client receiving the response, not an application performance issue.
Service URL: I can provide the service URL privately if needed for debugging.
Pinned Solution
a year ago
Agreed i faced a similar problem before, if the response size is too high it takes longer for the client's request to finish and sometimes the server keeps the request open even after its done *some cases i have seen
@virajshah if possible can you share more details about the route
4 Replies
a year ago
I think the platform is buffering the whole response or slow at forwarding it.
You can try to add streaming or chunked responses in FastAPI so it sends data bit by bit instead of all at once or add a simple quick test endpoint returning the smallest JSON to see if the delay still happens (or if it greatly lowers). This is just to prove my theory, then we can try to find a solution.
For the solution or more help in case my theory isn't right, I would need you to share more information with me.
lofimit
I think the platform is buffering the whole response or slow at forwarding it.You can try to add streaming or chunked responses in FastAPI so it sends data bit by bit instead of all at once or add a simple quick test endpoint returning the smallest JSON to see if the delay still happens (or if it greatly lowers). This is just to prove my theory, then we can try to find a solution.For the solution or more help in case my theory isn't right, I would need you to share more information with me.
a year ago
Agreed i faced a similar problem before, if the response size is too high it takes longer for the client's request to finish and sometimes the server keeps the request open even after its done *some cases i have seen
@virajshah if possible can you share more details about the route
a year ago
All of this beyond my level of knowledge but happy to provide whatever else info you need. I pulled myself off the plan here. Don't seem to run into an issue on render.com (runs in sub 1s as it does locally). It's a simple fastapi route. POST an image file. Python function takes it and generates a CLIP embedding via the huggingface transformers library. Device used is CPU (no GPU). The model is already pulled and loaded when the fastapi app starts in
@asynccontextmanager
async def lifespan(app: FastAPI):The main route has the following code - I removed the logger statements I included to track the time for everything.
inputs = processor(images=[image], return_tensors="pt", padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model.get_image_features(**inputs)
image_features = outputs / outputs.norm(dim=-1, keepdim=True)
embedding = image_features.cpu().numpy().tolist()[0]
return embeddinga year ago
Railway runs your app behind a proxy (like NGINX or an internal layer), which may:
Buffer responses until complete before sending them to the client.
Introduce latency due to SSL termination or network routing.
This is common on platforms like Vercel, Railway, Render, etc.
So i am thinking using a StreamingResponse might force the proxy to flush early
Status changed to Solved chandrika • 10 months ago
