Inconsistent latency on Metal (service to service communication over the private network)
githyperplexed
PROOP

10 months ago

Project ID:

d9cc6c5d-3e5a-46d6-9670-114272d6386e

Repo:

https://github.com/githyperplexed/latency-comparsion

Regions:

US East (GCP)

US East (Metal)

Problem:

Inconsistent latency between services on Metal.

When I say latency, all I'm referring to is the amount of time it takes to run a simple Redis operation.

Example:

const start = performance.now();

await redis.hGetAll("summary");

const latency = performance.now() - start;

I set up a simple test to compare between GCP and Metal. All communication is done over the private network. i.e. "redis.railway.internal".

I have deployed two identical pairs of services - a Bun server and an instance of Redis - one fully on GCP and one fully on Metal.

The Bun server does one read and one write to Redis every 3 seconds and stores the latency and number of operations so an average can be determined.

The results over the last week or so are shown below. Visiting either endpoint will trigger the read / write code and display the current results as well.

While the average for Metal is relatively low, the latencies for reads / writes tend to fluctuate significantly as can be seen in the example below.

Compare this to GCP, where most of the time I am seeing sub-ms times.

I guess I am just wondering if this is expected behavior or if there's something I'm missing?

Thanks in advance!

GCP: https://function-gcp.up.railway.app/

Three back to back page refreshes at the above url.

ops: 134922
read: 0.66 ms
write: 0.47 ms
avg read: 1.40 ms
avg write: 0.56 ms
ops: 134925
read: 0.66 ms
write: 0.48 ms
avg read: 1.40 ms
avg write: 0.56 ms
ops: 134928
read: 0.63 ms
write: 0.57 ms
avg read: 1.40 ms
avg write: 0.56 ms

Metal: https://function-metal.up.railway.app/

Three back to back page refreshes at the above url.

ops: 135441
read: 21.24 ms
write: 10.47 ms
avg read: 7.05 ms
avg write: 5.04 ms
ops: 135445
read: 4.13 ms
write: 1.43 ms
avg read: 7.05 ms
avg write: 5.04 ms
ops: 135448
read: 8.36 ms
write: 4.13 ms
avg read: 7.05 ms
avg write: 5.04 ms
Solved

8 Replies

10 months ago

Hello,

Thank you for the report, we will be looking into this closely in the coming days.

Best,
Brody


Status changed to Awaiting User Response Railway 10 months ago


githyperplexed
PROOP

10 months ago

Sounds good thanks!


Status changed to Awaiting Railway Response Railway 10 months ago


Railway
BOT

10 months ago

Hello!

We're acknowledging your issue and attaching a ticket to this thread.

We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.

Please reply to this thread if you have any questions!


Railway
BOT

10 months ago

✅ The internal ticket Increased latency between metal<->metal private networking has been marked as completed.


githyperplexed
PROOP

10 months ago

Amazing! Thank you guys so much! Just out of curiosity, can you share any info on what was causing the issue?


10 months ago

CPU Contention was the main cause, we made changes to how and where we schedule workloads so no single host runs CPU hot anymore.

We are now seeing better intra-region latency then what we see for GCP, but please let us know what you see too.


githyperplexed
PROOP

10 months ago

Awesome, well looks great on my end as well. Thanks!


Status changed to Solved githyperplexed 10 months ago


Railway
BOT

9 months ago

❌ The internal ticket Increased latency between metal<->metal private networking has been marked as canceled.


Loading...