How to scale up a NodeJS Socket IO server without Session Affinity

timbo-tj
PRO

a year ago

Hey there

We have a Socket IO server and we'd like the option to scale horizontally, but if I understand correctly this isn't possible without session affinity (handshake process will fail to establish communication?).

are we 'locked into' a single service in this case?

Project ID N/A

0 Replies

timbo-tj
PRO

a year ago

n/a


a year ago

its true railway's proxy does not support session affinity, but there would be nothing stopping you from deploying your own proxy that does support session affinity, the proxy you deploy would need to support session affinity, and dynamic upstreams so that it can internally proxy to each of your replicas. aka caddy


timbo-tj
PRO

a year ago

How do I proxy requests to a specific replica?


timbo-tj
PRO

a year ago

i already have a gateway that sits in front of everything. I guess it would mean that all socketio traffic would have to be piped through the gateway too?


a year ago

it wouldn't be a specific replica, the proxy would need do support sticky sessions though


a year ago

as long as the gateway supports dynamic proxy upstream that resolve from a AAAA lookup and sticky sessions, then you can do it.


a year ago

what is your current gateway? nginx?


timbo-tj
PRO

a year ago

homebrew gateway, nodejs server, i am running some custom routing logic 🙃


a year ago

oh I see, then you'd have one hell of a time writing code to proxy sticky sessions to the replicas


timbo-tj
PRO

a year ago

http-proxy-middleware to proxy the requests to the appropriate game server (dev/stg/prd)


timbo-tj
PRO

a year ago

mmm


timbo-tj
PRO

a year ago

yeah shame session affinity is not supported out of the box like heroku, that made this (surely common?) use case supported out of the box!


a year ago

for context, this is what a DNS lookup on the internal domain of a service resolves to when that service has multiple replicas


a year ago

doing a AAAA lookup is where you would get a list of your upstreams from


timbo-tj
PRO

a year ago

right, and i know nothing about what i am about to say, but conceptually i would do my own loadbalancing at the api gateway level, picking from one of those replicas, and associate all incoming requests from that client with that specific replica


timbo-tj
PRO

a year ago

would that be as 'simple' as proxying requests to ie [http://fd12:74d7:7e85::33:1190:a62c/](http://fd12:74d7:7e85::33:1190:a62c/) versus [http://hello-world.railway.internal/](http://hello-world.railway.internal/) ? (maybe i should move this discussion elsewhere hah)


a year ago

exactly, but it all has to be dynamically done, since all those ips change on every deployment


a year ago

you are missing the port, but yes exactly, hosts aren't technically needed for routing on the private network like how they are needed on the public network


timbo-tj
PRO

a year ago

yeah right right. does that mean a new DNS lookup for every request being proxied to ensure the replica is still valid? I don't think i will chase this solution down, as it seems pretty involved. just curious. it is pretty scary that we might not be able to handle a large unflux of users though


a year ago

you could cache for a several seconds, with some extra retry logic, that would save a lot lookups if you have a lot of traffic


a year ago

some context on railways current proxy, they use envoy right now, and eventually it will be thrown out the window for a home grown http proxy, after that I can't see adding support for sticky sessions to be too challenging relative to writing an http proxy that can handle railways scale


a year ago

I assume your gateway does already proxy websocket traffic? otherwise you could use a readymade solution that supports everything I've talked about.. caddy


timbo-tj
PRO

a year ago

no, on heroku the clients would be given the url of the appropraite server to connect directly to


timbo-tj
PRO

a year ago

well, not just on heroku - that's my current setup too


a year ago

keep in mind, railway is still growing and improving compared to an already well established service like heroku, not everything can be 1:1 feature wise, so sometimes some manual workarounds are going to be needed


timbo-tj
PRO

a year ago

clients send all their REST Api requests via api gateway. requests get routed to the correct place. there is an endpoint to query for details regarding 'realtime coms' (ie socketio) which returns the socketio server url. clients bypass the api gateway and connect directly


timbo-tj
PRO

a year ago

yeah for sure


timbo-tj
PRO

a year ago

been mostly good so far


timbo-tj
PRO

a year ago

and we are on 1vcpu at peak time at the moment so i guess plenty of room to go


timbo-tj
PRO

a year ago

but i imagine a lot of concurrent connections may choke out a single machine


a year ago

as long as your code can scale vertically without issues, you have about 31 vCPUs of headroom haha


timbo-tj
PRO

a year ago

yeah, i had a bit of a scare last night though, during peak time it didn't scale past 1vcpu but our average response time was 1-2 seconds


timbo-tj
PRO

a year ago

even querying the health-check endpoint (all it does is return {status: "OK"} ) was taking 1-2 seconds


a year ago

maybe it can't scale vertically then?


timbo-tj
PRO

a year ago

right now it seems to have resolved


timbo-tj
PRO

a year ago

yeah maybe not, though I am not sure what the bottle neck is. as if it was processing power, I imagine railway would have scaled it up past 1.1 vcpu


timbo-tj
PRO

a year ago

i am going to keep an eye on it over night and see how it goes tonight


a year ago

your code would need to be able to scale past 1.1 vcpu, not railway


timbo-tj
PRO

a year ago

How so, I don't have any handling for that. I just assumed 1> vcpu = more horsepower


timbo-tj
PRO

a year ago

kind of like higher tier dynos on heroku


a year ago

nope, at any given point your app has access to the full 32 vcpu, it's up to your code to be able to properly utilise that though


a year ago

so much different than dynos


timbo-tj
PRO

a year ago

right, i see


a year ago

I don't know your project architecture but couldn't you run multiple separate services for the websockets? each service would have only one replica so no sticky sessions needed, your gateway would just be responsible for having a list of the websocket services and their domains and hand them out when applicable, and unless I am not understanding what you've explained to me thus far, it kinda sounds like it can do this already?


timbo-tj
PRO

a year ago

yeah that would work, perhaps the best solution until sticky sessions are implemented


a year ago

I also did ask the person who is writing the new http proxy if sticky sessions where on their mind, I'll update you when I have news on that


timbo-tj
PRO

a year ago

and once sticky sessions are implemented i can colapse it all down to one service and scale up via replica


timbo-tj
PRO

a year ago

nice, thank you


a year ago

maybe even configurable proxying algorithms, currently it's only round robin


timbo-tj
PRO

a year ago

🔥


timbo-tj
PRO

a year ago

do you know of any more resources i can look at for utilising the other cores? is this just nodejs clustering?


a year ago

I'm honestly not sure, I'm not even a node dev haha


timbo-tj
PRO

a year ago

thats ok!


timbo-tj
PRO

a year ago

thanks for all the help


a year ago

no problem!


timbo-tj
PRO

a year ago

i dont know how to mark as closed lol


a year ago

only mods/admins can use that right now


nachocodoner
HOBBY

a year ago

what answer did you get here? is there a plan for sticky sessions to be implemented?


a year ago

with the new proxy it would not be hard to implement, but the need for it is low so it would not make sense to sink the time into it when there are also alternatives to sticky sessions, like storing sessions in redis so that you wouldn't need sticky sessions


nachocodoner
HOBBY

a year ago

I see. Using session storage with Redis requires understanding how to implement it in your app and how your framework handles sessions, right?

I implement Meteor apps and understand that managing sessions with Redis can be challenging. Typically, Meteor apps rely heavily on horizontal scaling and sticky sessions when scale. Since Meteor also uses Node.js and is single-core, replication may be necessary for scaling Node apps.

I won't focus on this right now since I don't need it. However, I'd like to request sticky sessions in Railway to simplify horizontal scaling for any app project. If it's easy to implement, it could save time and benefit projects that need it at scale.


a year ago

thank you for sharing your usecase!

I would like to ask you to open a feedback thread here for extra visibility -


nachocodoner
HOBBY

a year ago

I'm not sure why I can't add feedback. In submit button is disabled for me.


a year ago

screenshot?


nachocodoner
HOBBY

a year ago

1250146233436340200


a year ago

uhhh, too many characters?


nachocodoner
HOBBY

a year ago

Ok, I don't get the feedback on that 😅 I will try to shorten


nachocodoner
HOBBY

a year ago

But even reducing the text is not fixed


a year ago

i can reproduce, cc @Ray - submit button disabled for feedback modal


fixed, thanks for the flag!


Nacho here's the content of your thread if you want to resubmit:

Sticky sessions ensure that a user's requests are consistently routed to the same server in horizontal scaling. This maintains stateful interactions, such as user authentication, across multiple requests. Without sticky sessions, users might be directed to different servers that lack their session data, causing inconsistencies and a poor user experience.

Railway does not support sticky sessions, as noted in their horizontal scaling documentation: https://docs.railway.app/guides/optimize-performance#load-balancing-between-replicas.

I suggest adding this feature. While session storage in Redis is an option, it requires understanding how to implement it in your app and how your framework handles sessions. For instance, in my experience developing Meteor apps, managing sessions with Redis within this framework can be complex. Given that Meteor operates on Nodejs and is single-core, scaling applications requires replication. Consequently, using sticky sessions could greatly enhance the scalability of any Nodejs application at scale.

I don't need sticky sessions right now, but I'd like to request them in Railway to simplify horizontal scaling for any app. Other users might already be interested. If it's easy to implement with the new proxy, it could save time and benefit projects that need it at scale.

sorry bout this


a year ago

did you OCR Nacho's screenshot?


yes 😄


a year ago

haha nice


Credits to iOS. Nifty feature

1250291028712489000


a year ago

oh yeah Google lens does that too


nachocodoner
HOBBY

a year ago

Thank you. I will publish it then


nachocodoner
HOBBY

a year ago

Here's the link to the request to "sticky sessions". If you're interested, please vote and contribute to its implementation priority.

https://help.railway.app/feedback/sticky-sessions-fa65efc4


raflymln
TRIAL

a year ago

use redis pub/sub


raflymln
TRIAL

a year ago

like soketi does


nachocodoner
HOBBY

a year ago

I have a question after reading this tutorial: https://docs.railway.app/tutorials/proximity-steering

Is it possible to use Cloudflare's load balancing referencing to replicas on the same Railway service, or would we need to duplicate the services on Railway for Cloudflare's sticky sessions to work?

Could sticky sessions be a useful application of Cloudflare load balancing for Railway apps? They seem to support it, https://developers.cloudflare.com/load-balancing/understand-basics/session-affinity/#enabling-session-affinity-via-the-cloudflare-api


a year ago

you would need to have individual services with 1 replica each for cloudflare's sticky sessions to work


nachocodoner
HOBBY

a year ago

I see, this could be a valid approach for those who need it


nachocodoner
HOBBY

10 months ago

I have a question about Cloudflare load balancing in Railway. If I create multiple services in my app and add public domains to be accessed by the load balancer, they'll also be accessible to anyone on the internet.

Does this mean those endpoints could be indexed by search engines? Could it also expose them to attacks since they're not protected?

Am I correct, or is there a way to prevent this?


10 months ago

You're correct

I'm unfamiliar with this setup but in theory, you should be able to setup Cloudflare private tunnel that exposes your services from Railway->Cloudflare, and set up CF load balancers to point to the private endpoint of the tunnel within CF

But if you're already load balancing multiple domains, I don't see the point of doing that because the entrypoint to your LB would be public anyway? Hitting your LB's endpoint would have a similar net effect to visiting the LB'd domains directly


nachocodoner
HOBBY

10 months ago

Thank you for the insights. It makes sense to use private tunnels, and I might try that at some point.

The goal is to create controlled replicas of my app while allowing the load balancer to handle traffic, primarily enabling sticky sessions but any other LB configuration logic. These replicas should not be accessed directly by users or search engines, instead, they should be accessed through the load balancer. Does it make sense?


10 months ago

ray went on some much needed pto, so I can take over from here 🙂

I'm not sure if you want / need replicas in multiple regions, if you don't, you could always spin up your own proxy with its own load balancer that supports sticky sessions (caddy)

or of course what ray said, a CloudFlare tunnel so that you don't have to have any public domains on your individual services.


nachocodoner
HOBBY

9 months ago

I did it!

I installed Caddy as you suggested and set it up with a load balancer using sticky sessions. I pointed my main domain directly to the LB, and used the Caddy config example you provided here. I followed it, adjusting only to point directly to one service that includes both frontend and backend. The setup is great, as it uses dynamic upstreams that retrieve A/AAAA DNS records, which results on getting the individual IP addresses from one's service replicas (all using private networking). This lets me reuse Railway's replicas, each with the proper cookie for sticky sessions, without having to manually manage new services.

I'm updating this thread to help anyone looking to enable sticky sessions in their app setups on Railway, it is possible!

Thanks to Brody and rest of the team for sharing solutions that saved us a lot of time.

PD: For those who try this setup, special attention on the railwayapp-templates/caddy-reverse-proxy prerequisite:

Since Railway's internal network is IPv6 only the frontend and backend apps will need to listen on :: (all interfaces, both IPv4 and IPv6)
In my case, Node kind of app, I had to add on my app service the BIND_IP="::" env to make it work.


9 months ago

awesome, I'm glad you were able to find a setup that worked for you!


9 months ago

make it into a template and there's some credits in it for you