Severe Latency After Migration to Metal Edge on Railway

devarshishimpi

HOBBYOP

a year ago

Hey team,

I noticed a significant drop in performance after deploying a new service on Railway. Previously, I had a similar service running with networking set up on what I believe was GCP. The new one, using the same (or even simpler) code, is much slower.

The old deployment’s networking showed "Metal Edge" as an upgrade option, while the new one defaults to Metal with no way to switch or downgrade. At first, I thought GCP was causing the difference in speed, but I later learned both the deployments (old and new) are on Metal—just the proxy was still GCP for the old one.

Every millisecond matters for our customer experience, and the side-by-side latency comparison was huge. I’m trying to get to the root cause—whether it’s related to TCP proxy latency, database region, or something else.

24 Replies

devarshishimpi

HOBBYOP

a year ago

15b4da75-0c8b-4bb0-8f04-8d4167e64752

adam

MODERATOR

a year ago

I understand that the app feels slower, but concrete numbers would be helpful for actually diagnosing the issue. As this is a new app, you don't necessarily have a direct comparison to your old app.

Please send logs with RTT from a user, as well as internal response times in the app, i.e. how long the request takes to reach the user vs how long the service takes to actually process the request

adam

MODERATOR

a year ago

Breaking down even further would be even more helpful. RTT for communication between services & databases would be ideal

devarshishimpi

HOBBYOP

a year ago

Actually i can compare both the apps directly, as the new one is a subpart of the old one. So basically, it should actually perform better than the old one

devarshishimpi

HOBBYOP

a year ago

Anyways, I'll send you the logs and timings in a few hours

devarshishimpi

HOBBYOP

a year ago

Or if we could get on a call, it would be much better

brody

EMPLOYEE

a year ago

Is your external Mongo database located in Singapore too?
If every millisecond mattered, you would host Mongo on Railway and connect to it via the private network.
Your Redis database is in a separate project; it needs to be in the same project, and you need to connect to it via the private network.

adam

MODERATOR

a year ago

^ Brody is able to see your project as he is a member of the team, take all this advice!

devarshishimpi

HOBBYOP

a year ago

I'm using Mongo for another service also, so I'm just using a cluster on Atlas. And yes, redis is a good catch. I'll try that out and let you know

brody

EMPLOYEE

a year ago

I think you should take point #2 into consideration, having Mongo in the same data center on Railway would significantly reduce latency, and you did say every millisecond matters.

devarshishimpi

HOBBYOP

a year ago

No, actually I just checked, this service just authenticates once (which requires mongo). Other than that, everything else runs over websockets (not even requiring redis involvement). The redis task is running synchronously in the side, not even creating any trouble for websockets

devarshishimpi

HOBBYOP

a year ago

Wait, I'll actually provide you with the other service's project id

b7229d4a-9cb5-4e53-a3cf-be4923d369f5

devarshishimpi

HOBBYOP

a year ago

You can have a look at it. The only difference you'll find in both the services would be just this metal edge tag under the networking section

1396369758647226450

1396369758881841232

devarshishimpi

HOBBYOP

a year ago

Can you manually migrate this from metal edge back to gcp? Maybe I can test it out and actually see if it makes a difference

devarshishimpi

HOBBYOP

a year ago

So basically, what happened now is that i generated a new domain in my old service, which also migrated my old service to metal edge automatically. And now guess what, I'm getting the same lag as i'm getting on my new service

devarshishimpi

HOBBYOP

a year ago

It definitely has something to do with metal edge

devarshishimpi

HOBBYOP

a year ago

@Brody

phin

EMPLOYEE

a year ago

How are you measuring latency? Could you also provide both of the measurements you're talking about?