Deployment runtime issue

devxmxina

HOBBY

17 days ago

Hi,

I’m deploying an application that integrates three LLM APIs. Smaller models (Mistral, Gemini-1.5-flash) run fine, but when I run all three together the deployment fails at runtime -- especially with the larger paid model (Gemini-2.5-pro). Locally it works, but on Railway the backend never returns to the frontend.

It looks like a request/idle timeout is reached before the model finishes. Where can I configure longer request timeouts on Railway (ideally 2–3 minutes)? I adjusted the code to shorten work, but that reduces output quality. Guidance on the correct way to extend execution time would be appreciated.

Thank you.

$10 Bounty

2 Replies

brody

EMPLOYEE

17 days ago

Our request timeout is 15 minutes.

devxmxina

HOBBY

17 days ago

Thanks for clarifying. Since the request window is 15 minutes, I suspect the issue may be in my implementation (possibly related to Flask’s request handling or how I manage background jobs). Could you confirm if there are any common pitfalls when running long-running Flask apps on Railway?

brody

EMPLOYEE

17 days ago

There are none on the platform side of things, given we are not serverless, so there's no max execution time or anything like that.

Unfortunately, I don't know what your issue is. I just wanted to clear up the misconceptions that were brought up, but since it is not a platform issue, I will bounty this thread in hopes that the community can help you debug.

Anonymous

HOBBY

17 days ago

I think there was some issue from the Google's end today, api calls were mostly returning 503 errors (from my personal experience), maybe that was the problem? try testing it now

vedmaka

FREETop 10% Contributor

16 days ago

Did you consider changing the application architecture in a way that instead of waiting for the backend to respond to the front-end on a long running task you schedule a job into some queue (as you mentioned Flash you may find Rq https://python-rq.org/ quite simple and easy to use for your use-case). The job then can be picked by a worker from the queue, and the backed can let your front-end know the job id. The front-end then may poll the job for its status periodically and fetch the job result as soon as the job completes