jobs sent to load balancer seem to 'stick' to one replica
sdavid14
PROOP

7 days ago

I have a single backend railway container that sends long-running (10 seconds to 10 minutes) jobs to a pool of 8 instances (replicas) to do the work. status is backed by a database table, and updated in the frontend via polling.

I'm noticing that the jobs appear to be getting sent to the same replica (or maybe two replicas), even if its busy and other replicas are doing nothing. ie. as if there's some affinity or stickiness happening, by IP or otherwise.

How to ensure work is "non-stick" slightly_smiling_face emoji , ie. gets distributed across multiple replicas even if initiated by the same host?

Steve

Solved

5 Replies

Our load balancer randomly distributes requests across replicas and does not support sticky sessions, so there is no affinity built in on our side. The most common cause of this behavior is HTTP connection reuse (keep-alive) in the calling client - if your backend opens a persistent connection to the worker pool, subsequent requests will reuse that same connection and always land on the same replica. You can find more details on how our replica load balancing works in our scaling docs.


Status changed to Awaiting User Response Railway 6 days ago


ray-chen

Our load balancer randomly distributes requests across replicas and does not support sticky sessions, so there is no affinity built in on our side. The most common cause of this behavior is HTTP connection reuse (keep-alive) in the calling client - if your backend opens a persistent connection to the worker pool, subsequent requests will reuse that same connection and always land on the same replica. You can find more details on how our replica load balancing works in our [scaling docs](https://docs.railway.com/deployments/scaling#load-balancing-between-replicas).

sdavid14
PROOP

6 days ago

I turned off keep-alive and all requests are still being routed to the same replica. Does it matter if i'm using the internal network interface ie. app.railway.internal:8080, or must I use a public interface to get the load balancer engaged ?


Status changed to Awaiting Railway Response Railway 6 days ago


6 days ago

On the Internal network, each replica is exposed as a DNS AAAA answer, meaning if you have 10 replicas, 10 IPv6 IPs will be returned in a random order.

From there, it's possible your application is still preferring a single IP, but that application level behavior is unfortunately out of our control.


Status changed to Awaiting User Response Railway 6 days ago


sdavid14
PROOP

6 days ago

Ok. i'm going to use a redis queue instead. thank you !


Status changed to Awaiting Railway Response Railway 6 days ago


6 days ago

No problem!


Status changed to Awaiting User Response Railway 6 days ago


Status changed to Solved brody 6 days ago


Loading...