jobs sent to load balancer seem to 'stick' to one replica

sdavid14

PROOP

4 months ago

I have a single backend railway container that sends long-running (10 seconds to 10 minutes) jobs to a pool of 8 instances (replicas) to do the work. status is backed by a database table, and updated in the frontend via polling.

I'm noticing that the jobs appear to be getting sent to the same replica (or maybe two replicas), even if its busy and other replicas are doing nothing. ie. as if there's some affinity or stickiness happening, by IP or otherwise.

How to ensure work is "non-stick" 🙂 , ie. gets distributed across multiple replicas even if initiated by the same host?

Steve

Solved

5 Replies

ray-chen

EMPLOYEE

4 months ago

Our load balancer randomly distributes requests across replicas and does not support sticky sessions, so there is no affinity built in on our side. The most common cause of this behavior is HTTP connection reuse (keep-alive) in the calling client - if your backend opens a persistent connection to the worker pool, subsequent requests will reuse that same connection and always land on the same replica. You can find more details on how our replica load balancing works in our scaling docs.

Status changed to Awaiting User Response Railway • 4 months ago

ray-chen

sdavid14

PROOP

4 months ago

I turned off keep-alive and all requests are still being routed to the same replica. Does it matter if i'm using the internal network interface ie. app.railway.internal:8080, or must I use a public interface to get the load balancer engaged ?

Status changed to Awaiting Railway Response Railway • 4 months ago

brody

EMPLOYEE

4 months ago

On the Internal network, each replica is exposed as a DNS AAAA answer, meaning if you have 10 replicas, 10 IPv6 IPs will be returned in a random order.

From there, it's possible your application is still preferring a single IP, but that application level behavior is unfortunately out of our control.

Status changed to Awaiting User Response Railway • 4 months ago

sdavid14

PROOP

4 months ago

Ok. i'm going to use a redis queue instead. thank you !

Status changed to Awaiting Railway Response Railway • 4 months ago

brody

EMPLOYEE

4 months ago

No problem!

Status changed to Awaiting User Response Railway • 4 months ago

Status changed to Solved brody • 4 months ago

Welcome!