Are there limits on total transfer size over SSE?

craigspaethPRO

6 months ago

I have an app that tries to keep a server sent events connection open for a long time. I've read the networking docs and I'm aware of the 5 minutes max HTTP duration so the client closes and re-opens an EventSource every minute.

This works fine when the total transfer size of an open SSE connection is low (i.e. number of events × event payload size—sent over the course of a minute). However, when the volume of events and/or payload size gets large the SSE connection keeps getting dropped. When I inspect the Chrome Dev Tools Network Panel I see once an SSE connection reaches around 1MB total transferred it gets dropped from the server-side.

This doesn't happen locally so I suspect it's not between the browser and application server but at Railway's load balancer/gateway layer (Nginx?). I also didn't have this issue when hosted on Heroku—but I haven't tested with other PAAS yet.

Is there any more documentation on Railway's transfer limits? Or do you have any other theories as to what might be happening? Does Railway not support SSE at scale like this and prefer websockets for long-running real time apps?

Solved

8 Replies

6 months ago

Hey @craigspaeth,

We're going to test this out today and will get back to you soon!


Status changed to Awaiting User Response railway[bot] 6 months ago


Status changed to In Progress melissa 6 months ago


6 months ago

Hey Craig!

We wrote a little tool to help us validate this - https://utilities.up.railway.app/?filter=/sse/data

It will send chunks of null data via server-sent events until the total size is reached.

Using this, we did not find any issues with total sizes up to even 15 MB, meaning the entire SSE connection from start to finish transferred 15 MB of data.

tl;dr - We can't reproduce.

But it's clear you are having issues when deployed to Railway, and we want to do all we can to help, so we would greatly appreciate a minimum reproducible example. Unfortunately, at this time, I don't have any idea of what could be happening.

And aside from the 5-minute max connection time, I'm not aware of any transfer limits that we impose.

P.S. We don't use NGINX, it would not fit our needs, let alone be capable of handling our scale, we wrote our own proxy and router in house.


craigspaethPRO

6 months ago

Ok thank you so much for taking the time to try to reproduce this issue. I will keep debugging and report back if I can figure out how to find a minimally reproducible example.


6 months ago

Sounds good, in the meantime, would you want to tell us more characteristics of your failing SSE?

For starters -
- How many messages.

- The length of the messages.

- The time between each message.

- Total time before the connection is closed.

And whatever else you think could help us!


craigspaethPRO

6 months ago

For context the app is a real time video conferencing + whiteboard style app. I've attached a video showing my issue and you can explore this room at: https://www.sendingstone.com/room/test.

How many messages
There's 3 simultaneous SSE connections open, one is quite chatty sending cursor positions in the tens to hundreds of milliseconds. Another is less frequent but sending large blobs of JSON.

The length of the messages
The large JSON blobs can get quite big—like > 1M characters. This could be the bottleneck? FWIW its also gzipping the contents.

The time between each message
It depends on the activity in the room but often between 100ms–1s.

Total time before the connection is closed

It depends on how big the room gets but if it gets quite large from "whiteboard" activity then it can close after several seconds of interaction.

Obviously I'm pushing some pretty unusually large data on my end and I know I can do much better optimizations on my end. But like I said, I can't reproduce this locally and I also just deployed this to Render and can't reproduce there either (you can test the same room deployed to Render here: https://astral-projection.onrender.com/room/test). So I'm trying to understand what my limits are with Railway before flying blind and making premature optimizations.

Thanks again so much for the hands on help!

Attachments


6 months ago

Thank you for the additional info, we will see what we can accomplish, though we would still really appreciate a minimal reproducible example since I'm not really seeing any disconnects on that live site.


craigspaethPRO

6 months ago

Thanks for the hands on help Brody! After struggling with the Dockerfile for a while I finally got it working and the issue went away. So this does indeed seem to be something with Railway's default bun version or set up vs. what's in my Dockerfile (attached)

I saw the docs say:

> We support Bun, but due to Bun being in alpha, it is unstable and very experimental.

Which is fair and I chose to bleed on the edge here. But FWIW Bun has been past 1.0 for over a year now so it might be worth Railway investing some time in better default Bun support.

Thanks again for such quick and hands on help! Really appreciate the customer service here.

Attachments


6 months ago

This is amazing news to hear, thank you so much, another user is having random 502's and they are also on bun with nixpacks, and I too have asked them to switch to a Dockerfile, so it's very nice to hear it has worked for you! (haven't heard back from them)

> Which is fair and I chose to bleed on the edge here. But FWIW Bun has been past 1.0 for over a year now so it might be worth Railway investing some time in better default Bun support.

We definitely are installing >1.0 but still not the latest, so you are absolutely right, the bun provider needs to use the absolute latest.


Status changed to Solved brody 6 months ago