8 months ago
Hey team,
I noticed a significant drop in performance after deploying a new service on Railway. Previously, I had a similar service running with networking set up on what I believe was GCP. The new one, using the same (or even simpler) code, is much slower.
The old deployment’s networking showed "Metal Edge" as an upgrade option, while the new one defaults to Metal with no way to switch or downgrade. At first, I thought GCP was causing the difference in speed, but I later learned both the deployments (old and new) are on Metal—just the proxy was still GCP for the old one.
Every millisecond matters for our customer experience, and the side-by-side latency comparison was huge. I’m trying to get to the root cause—whether it’s related to TCP proxy latency, database region, or something else.
24 Replies
8 months ago
I understand that the app feels slower, but concrete numbers would be helpful for actually diagnosing the issue. As this is a new app, you don't necessarily have a direct comparison to your old app.
Please send logs with RTT from a user, as well as internal response times in the app, i.e. how long the request takes to reach the user vs how long the service takes to actually process the request
8 months ago
Breaking down even further would be even more helpful. RTT for communication between services & databases would be ideal
Actually i can compare both the apps directly, as the new one is a subpart of the old one. So basically, it should actually perform better than the old one
8 months ago
Is your external Mongo database located in Singapore too?
If every millisecond mattered, you would host Mongo on Railway and connect to it via the private network.
Your Redis database is in a separate project; it needs to be in the same project, and you need to connect to it via the private network.
8 months ago
^ Brody is able to see your project as he is a member of the team, take all this advice!
I'm using Mongo for another service also, so I'm just using a cluster on Atlas. And yes, redis is a good catch. I'll try that out and let you know
8 months ago
I think you should take point #2 into consideration, having Mongo in the same data center on Railway would significantly reduce latency, and you did say every millisecond matters.
No, actually I just checked, this service just authenticates once (which requires mongo). Other than that, everything else runs over websockets (not even requiring redis involvement). The redis task is running synchronously in the side, not even creating any trouble for websockets
Wait, I'll actually provide you with the other service's project idb7229d4a-9cb5-4e53-a3cf-be4923d369f5
You can have a look at it. The only difference you'll find in both the services would be just this metal edge tag under the networking section


Can you manually migrate this from metal edge back to gcp? Maybe I can test it out and actually see if it makes a difference
So basically, what happened now is that i generated a new domain in my old service, which also migrated my old service to metal edge automatically. And now guess what, I'm getting the same lag as i'm getting on my new service
8 months ago
How are you measuring latency? Could you also provide both of the measurements you're talking about?
8 months ago
what happened to their roles <:Thinking:1360710341239242762>
8 months ago
can they still chat here without the roles?
8 months ago
they can't because to talk here you need the support access role 😔
8 months ago
and they need the community access role to talk in #🎤|chit-chat
8 months ago
my assumption is that the user left and rejoined and never went through sign-up again