2 months ago
I am running a backend service for my mobile app that has two replicas, one in US West, and one in Asia Southeast.
I run websocket connections/GraphQL requests in my service, and recently started noticing that a larger proportion of my requests started to route to the Asia Southeast replica, even though I live in US West.
I was not experiencing this issue before, and it only become a lot prevalent in the last couple of days.
I have updated my service to handle any degradation/regression caused by this behavior.
The docs say that Railway automatically routes based on geography here: https://docs.railway.com/deployments/optimize-performance
If you are using multi-region replicas, Railway will automatically route public traffic to the nearest region and then randomly distribute requests to the replicas within that region.
14 Replies
I have updated my service to run replicas in US East, and Europe West for now with a Redis DB running in US East to sync any stateful stuff between the two replicas as a fix. But I am still confused why this happened.
Here's data from my telemetry where a user had stateful data in a replica in US West, but then had a socket connection to access the same stateful data in Asia Southeast, or vice versa.
This obviously helped find gaps in my service's architecture, but still quite bizarre.
Attachments
Attachments
I live in San Francisco, using my home IP, not on a VPN. I would expect my request to go to US East replica, but it's hitting Europe West.
I have now updated all my replicas to only be in US East for now, until I understand why this could be happening.
2 months ago
Hey; Same issues here since 6 march !
I have restarted everything on my side but the problem persist.
Status changed to Awaiting Railway Response dev • 2 months ago
2 months ago
Hey, sorry about this. It's a known issue that we're actively tracking. A change to our proxy routing infrastructure on March 6 caused multi-region deployments to route requests across all regions instead of to the nearest one. We have a fix in progress.
We'll update this thread when it's resolved. Glad to hear you've already adapted your service to handle it in the meantime.
Status changed to Awaiting User Response Railway • 2 months ago
sam-a
Hey, sorry about this. It's a known issue that we're actively tracking. A change to our proxy routing infrastructure on March 6 caused multi-region deployments to route requests across all regions instead of to the nearest one. We have a fix in progress. We'll update this thread when it's resolved. Glad to hear you've already adapted your service to handle it in the meantime.
2 months ago
Thanks for the update, but this is still causing a production outage for us.
We have been impacted since March 6, and the issue remains unresolved.
Status changed to Awaiting Railway Response Railway • 2 months ago
2 months ago
ack, we're tracking this. However, could you expand on this causing a production outage?
Status changed to Awaiting User Response Railway • 2 months ago
nico
ack, we're tracking this. However, could you expand on this causing a production outage?
2 months ago
We therefore have one instance in the US and one in Europe. Our system retrieves data through the same API, which uses the service region environment variable.
However, the load balancing sends me to the US about one out of every two or three times, even though I am located in Europe. We are seeing a similar situation for American users, who are being load-balanced to the EU even though they are located in the US. (We have two separate MongoDB databases: one for the US and one for the EU.)
Status changed to Awaiting Railway Response Railway • 2 months ago
2 months ago
So you are saying you hit the wrong Mongo instance because of the bad routing?
Status changed to Awaiting User Response Railway • 2 months ago
2 months ago
Yes, exactly. Since we use the RAILWAY_REGION environment variable to determine which MongoDB instance to connect to, each replica only connects to its regional database (US replica → US Mongo, EU replica → EU Mongo).
With the current routing issue, a European user can be routed to the US replica, which then queries the US MongoDB — where their data doesn't exist. The same happens in reverse for American users hitting the EU replica.
This means users intermittently get empty or incorrect responses depending on which replica the load balancer sends them to, which is what's causing the production outage on our end.
Status changed to Awaiting Railway Response Railway • 2 months ago
Status changed to Awaiting User Response nico • 2 months ago
2 months ago
I killed it and relaunched it, and everything looks good now. But sorry to say this, guys: your support service has been quite poor and not really up to standard.
Status changed to Awaiting Railway Response Railway • 2 months ago
2 months ago
Sincere apologies, and we do appreciate the feedback. Given that the problem happened it the first place (which obviously we wish it had not), what would you have liked us to do better?
Status changed to Awaiting User Response Railway • 2 months ago
2 months ago
The issue should be resolved. Redeploying the service will make it take effect. Let us know if you see any further issues.
Status changed to Solved sam-a • 2 months ago