Load balancer not routing requests based on geography.
shehbajdhillon
PROOP

2 months ago

I am running a backend service for my mobile app that has two replicas, one in US West, and one in Asia Southeast.

I run websocket connections/GraphQL requests in my service, and recently started noticing that a larger proportion of my requests started to route to the Asia Southeast replica, even though I live in US West.

I was not experiencing this issue before, and it only become a lot prevalent in the last couple of days.

I have updated my service to handle any degradation/regression caused by this behavior.

The docs say that Railway automatically routes based on geography here: https://docs.railway.com/deployments/optimize-performance

If you are using multi-region replicas, Railway will automatically route public traffic to the nearest region and then randomly distribute requests to the replicas within that region.

Solved

14 Replies

shehbajdhillon
PROOP

2 months ago

I have updated my service to run replicas in US East, and Europe West for now with a Redis DB running in US East to sync any stateful stuff between the two replicas as a fix. But I am still confused why this happened.

Here's data from my telemetry where a user had stateful data in a replica in US West, but then had a socket connection to access the same stateful data in Asia Southeast, or vice versa.

This obviously helped find gaps in my service's architecture, but still quite bizarre.

image.png

Attachments


shehbajdhillon
PROOP

2 months ago

image.png

Attachments


shehbajdhillon
PROOP

2 months ago

I live in San Francisco, using my home IP, not on a VPN. I would expect my request to go to US East replica, but it's hitting Europe West.


shehbajdhillon
PROOP

2 months ago

I have now updated all my replicas to only be in US East for now, until I understand why this could be happening.


simerca
PRO

2 months ago

Hey; Same issues here since 6 march !

I have restarted everything on my side but the problem persist.


Status changed to Awaiting Railway Response dev 2 months ago


sam-a
EMPLOYEE

2 months ago

Hey, sorry about this. It's a known issue that we're actively tracking. A change to our proxy routing infrastructure on March 6 caused multi-region deployments to route requests across all regions instead of to the nearest one. We have a fix in progress.

We'll update this thread when it's resolved. Glad to hear you've already adapted your service to handle it in the meantime.


Status changed to Awaiting User Response Railway 2 months ago


sam-a

Hey, sorry about this. It's a known issue that we're actively tracking. A change to our proxy routing infrastructure on March 6 caused multi-region deployments to route requests across all regions instead of to the nearest one. We have a fix in progress. We'll update this thread when it's resolved. Glad to hear you've already adapted your service to handle it in the meantime.

simerca
PRO

2 months ago

Thanks for the update, but this is still causing a production outage for us.

We have been impacted since March 6, and the issue remains unresolved.


Status changed to Awaiting Railway Response Railway 2 months ago


2 months ago

ack, we're tracking this. However, could you expand on this causing a production outage?


Status changed to Awaiting User Response Railway 2 months ago


nico

ack, we're tracking this. However, could you expand on this causing a production outage?

simerca
PRO

2 months ago

We therefore have one instance in the US and one in Europe. Our system retrieves data through the same API, which uses the service region environment variable.

However, the load balancing sends me to the US about one out of every two or three times, even though I am located in Europe. We are seeing a similar situation for American users, who are being load-balanced to the EU even though they are located in the US. (We have two separate MongoDB databases: one for the US and one for the EU.)


Status changed to Awaiting Railway Response Railway 2 months ago


sam-a
EMPLOYEE

2 months ago

So you are saying you hit the wrong Mongo instance because of the bad routing?


Status changed to Awaiting User Response Railway 2 months ago


simerca
PRO

2 months ago

Yes, exactly. Since we use the RAILWAY_REGION environment variable to determine which MongoDB instance to connect to, each replica only connects to its regional database (US replica → US Mongo, EU replica → EU Mongo).

With the current routing issue, a European user can be routed to the US replica, which then queries the US MongoDB — where their data doesn't exist. The same happens in reverse for American users hitting the EU replica.

This means users intermittently get empty or incorrect responses depending on which replica the load balancer sends them to, which is what's causing the production outage on our end.


Status changed to Awaiting Railway Response Railway 2 months ago


Status changed to Awaiting User Response nico 2 months ago


simerca
PRO

2 months ago

I killed it and relaunched it, and everything looks good now. But sorry to say this, guys: your support service has been quite poor and not really up to standard.


Status changed to Awaiting Railway Response Railway 2 months ago


sam-a
EMPLOYEE

2 months ago

Sincere apologies, and we do appreciate the feedback. Given that the problem happened it the first place (which obviously we wish it had not), what would you have liked us to do better?


Status changed to Awaiting User Response Railway 2 months ago


sam-a
EMPLOYEE

2 months ago

The issue should be resolved. Redeploying the service will make it take effect. Let us know if you see any further issues.


Status changed to Solved sam-a 2 months ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...