Flask/Gunicorn on Railway: JSON response truncation (Unterminated string errors in production)

ticiochatbot

FREEOP

7 months ago

Hi all,

I’ve been deploying a Flask + Gunicorn API on Railway, and I’m running into a frustrating problem where large JSON responses get truncated in production, but work perfectly fine in local dev.

Setup

Backend: Flask app, running under Gunicorn
Frontend: Gradio app making POST requests to /inference
Deployment: Railway free tier (no custom proxy config)

Procfile:

web: gunicorn wsgi:app -b 0.0.0.0:$PORT --workers 2 --threads 4 --timeout 120

The Problem

When my model generates a long text response (multi-KB, Spanish UTF-8 output), the client receives only part of the JSON. This leads to errors like:

JSONDecodeError: Unterminated string starting at: line 1 column 16 (char 15)

If I log response.content, I see it ends mid-string — half a Base64 blob or mid-paragraph of text.

Locally, I can return responses >20k chars with no issue. In production on Railway, responses above some threshold (seems around a few KB) get truncated.

I also tested with a dummy /echo endpoint that returns "A" * 20000. Locally it works, in prod the body gets chopped.

What I’ve Tried

Switched from Flask’s dev server to Gunicorn (Procfile).
Increased Gunicorn limits: --timeout 120, --limit-request-line 0, --limit-request-field_size 0.
Forced UTF-8 JSON (app.config["JSON_AS_ASCII"] = False).
Even encoded messages in Base64 to avoid escaping issues.

Still, long responses are truncated.

Question

Is this truncation a Railway proxy/body size limit (especially on the free tier)?
If I upgrade my Railway plan, will that increase the allowed response size?
Is there a config/setting I’m missing to let Railway pass through larger responses?

I want to know whether this is expected platform behavior, or if I’m doing something wrong with Flask/Gunicorn.

Any insights or advice would be greatly appreciated

Thanks!

$10 Bounty

3 Replies

Railway

BOT

7 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!

ticiochatbot

FREEOP

7 months ago

Some added context I feel it was not properly explained in the original post:

My app consists of an flask API which is rendering some templates (jinja). The problem arises in a template which have a Gradio Lite application inside it (it runs a gradio app using micropip) inside this gradio app I am sending a request to the inference endpoint in my flask api which then connects to private instance of a hugging face space. and sends the response back to the gradio app inside the template.

Everything described before works fine, the problem is exactly in either sending or recieving the response were it gets truncated. This also works 100% in local enviorment.

brody

EMPLOYEE

7 months ago

Hello,

That is very odd, but I can assure you we don't have any body size limits on our side. For example, here is a request that returns 3000000 bytes successfully:

https://utilities-metal-us-east-zfs.up.railway.app/bytes?bytes=3000000

I'll go ahead and bounty this thread so that the community may come to help debug the issue with your application.

Please prepare a minimal reproducible example, otherwise peoples ability to help you will be severely limited.

Best,

Brody

Status changed to Awaiting User Response Railway • 7 months ago