Expected latency on railway network (both on metal and off)?

a year ago

When you're testing the app and getting <1ms response times, are you running it locally? If so, those response times should not be expected when running on a cloud provider such as Railway.

I'll need some more info to help you out here. First, please send your project ID so a member of the team can view your project if need.

a year ago

Where are you located and what region is your service deployed in? You can view this info from the service settings on Railway

a year ago

Where are you located?

a year ago

Does your service communicate with any other services/databases?

colemandunn

PROOP

a year ago

the 1ms response times come from sentry, which is tracing my deployed service. Not sure if response time is the right terminology, more like processing time? For example a response from a python service i have that hits the database has a duration (what it is called in sentry) of about 20ms.

colemandunn

PROOP

a year ago

Utah

colemandunn

PROOP

a year ago

no it does not, all in-memory

colemandunn

PROOP

a year ago

looking more in sentry, they are taking about .33ms, the table is rounding down

colemandunn

PROOP

a year ago

ok the project id is not associated with my discord account it that alright? I have been on railway for a long time on a personal account and recently we have started using it at work, but my discord is already linked to my personal railway

here it is though: 11ee2d7d-c3b6-477f-81a1-77cd0c8f449e

colemandunn

PROOP

a year ago

thank you for the timely response!

a year ago

That's not an issue

a year ago

Could you please try swapping your service off of metal?

a year ago

Any region is fine.

colemandunn

PROOP

a year ago

sure thing, is it alright if i do it with a duplicate service? dont want to muddy the waters but also want to keep this service running but if i have to i will

a year ago

Do what you gotta do! That's not a problem

colemandunn

PROOP

a year ago

ok moved it to US west no metal. just running with a quick script

num_requests = 10
# Function to measure request times and calculate average
total_time = 0
for i in range(num_requests):
    start_time = time.time()
    try:
        response = requests.get(base_url, headers=headers, timeout=70)
        response_time = time.time() - start_time
        print(f"Request {i + 1}: Took {response_time:.2f} seconds")
        total_time += response_time
    except requests.exceptions.RequestException as e:
        print(f"Request {i + 1}: Failed with exception: {e}")

if num_requests > 0:
    average_time = total_time / num_requests
    print(f"\nAverage time for {num_requests} requests: {average_time:.2f} seconds")

gets me

Request 1: Took 0.19 seconds

Request 2: Took 0.17 seconds

Request 3: Took 0.17 seconds

Request 4: Took 0.17 seconds

Request 5: Took 0.17 seconds

Request 6: Took 0.18 seconds

Request 7: Took 0.17 seconds

Request 8: Took 0.17 seconds

Request 9: Took 0.18 seconds

Request 10: Took 0.17 seconds

Average time for 10 requests: 0.17 seconds

so it is looking better for sure. maybe i should have done US East non metal since that is where the other deployments are

colemandunn

PROOP

a year ago

also here are some railway http logs for the old deployment

1358924718622769385

colemandunn

PROOP

a year ago

and the new one (wow)

1358924807168594020

colemandunn

PROOP

a year ago

before (on the east metal deployment) a script like this was yielding between 300ms to 700ms

colemandunn

PROOP

a year ago

on US East non metal VVVVVV:

Request 2: Took 0.29 seconds
Request 3: Took 0.28 seconds
Request 4: Took 0.29 seconds
Request 5: Took 0.23 seconds
Request 6: Took 0.22 seconds
Request 7: Took 0.29 seconds
Request 8: Took 0.22 seconds
Request 9: Took 0.29 seconds
Request 10: Took 0.22 seconds

Average time for 10 requests: 0.26 seconds```

colemandunn

PROOP

a year ago

US East non metal (much larger than us west hmmm)

1358925778695356466

colemandunn

PROOP

a year ago

what do the response times in the http logs measure exactly?

a year ago

Hmm not sure. I'm going to have to tag in the team here as there may be bugs I'm not aware of with metal

a year ago

a year ago

This thread has been escalated to the Railway team.

Status changed to Awaiting Railway Response adam • about 1 year ago

a year ago

it can run a test with 2,000 concurrent requests in about 200ms

where is this number from? your local computer?

do you have the metal edge enabled on the service you have deployed to metal?

colemandunn

PROOP

a year ago

Ya local computer, just trying to make the point the endpoint itself is very fast and does very little work

a year ago

what computer do you have? It is very possible it has a higher per core performance than the CPUs we use on Railway as we went for higher price efficiency for our customers

colemandunn

PROOP

a year ago

M4 pro 48GB ram. The endpoint just does one to a few hashmap lookups and returns. According to sentry it is still very fast on railway too (0.33ms duration avg)

a year ago

Yep our CPUs on Metal do not have the same single core performance of the M4, it's just not a fair comparison, we choose 5th Gen Xeons for the cost efficiency that we pass onto the customers

colemandunn

PROOP

a year ago

That makes sense! I don't expect the same performance, again the 2000 burst requests is just an example to illustrate that the endpoint doesn't take a lot of time. The hosted service doesn't even get near that amount of traffic, hence the 0.0 vCPU avg usage on railway with 32 allocated cores. The main thing I am wanting to get to the bottom of is the expected network latency. Sentry is reporting that the request duration is indeed very low, but like I mentioned the east metal deployment was yielding between 300ms to 700ms response times, which I doubt would be caused by a 5th Gen Xeon doing a hashmap lookup.

a year ago

haha our routing had to be far more complex than a hashmap at our scale, but yes it's not the cause here.

Do you also have the Metal Edge enabled?

colemandunn

PROOP

a year ago

sorry, i was referring to the work my endpoint does haha (it just does a lookup to a hashmap stored in memory, my attempt at saying i don't think the CPU is what is causing the latency here)

I do not believe I do, currently still on US East non metal for the lower latency, would i have to switch back to metal for this? Thanks!

colemandunn

PROOP

a year ago

also could i get some clarification on this? thanks! I am assuming this includes network latency since sentry keeps reporting 0-1ms, unless sentry is wrong of course, but the stack/breadrump traces seem correct.

a year ago

sentry is reporting function execution time, those times in the http logs record round trip time from when your request hits our edge network to when your application finishes the request.

a year ago

So may I ask where you are located, because if you aren't east cost you will of course see increased latency due to distance

colemandunn

PROOP

a year ago

Utah, however our production environments are on the east coast which is why this is deployed there (however this is used by our dev environments too). I just thought the latency was unusually high so wanted to check in to see if this is expected for now or if there is anything I should do on my end

a year ago

please put the service on metal and switch on the edge network and test again

colemandunn

PROOP

a year ago

sounds good! I'll check back in a bit thanks!

colemandunn

PROOP

a year ago

Just to check, edge doesn't replicate your service does right? This service is meant to be stateful (in memory stateful) as a single global source of truth

colemandunn

PROOP

a year ago

aka it is serving as a glorified redis db haha, in other words there can only be a single instance of it for it to work/be useful

a year ago

The edge network does not replicate anything

colemandunn

PROOP

a year ago

ok thanks, its back on east metal but I don't see where to enable edge

a year ago

hover your mouse over the domain

colemandunn

PROOP

a year ago

Request 2: Took 0.20 seconds
Request 3: Took 0.28 seconds
Request 4: Took 0.26 seconds
Request 5: Took 0.20 seconds
Request 6: Took 0.20 seconds
Request 7: Took 0.20 seconds
Request 8: Took 0.20 seconds
Request 9: Took 0.18 seconds
Request 10: Took 0.20 seconds

Average time for 10 requests: 0.23 seconds```

colemandunn

PROOP

a year ago

wow that is much better, thank you!

a year ago

no problem!

colemandunn

PROOP

a year ago

any idea how this one was doing so well? i know I'm closer but not THAT much closer. Could it have been a demand thing/low usage at that time?