Expected latency on railway network (both on metal and off)?
colemandunn
PROOP

10 months ago

hello I have a simple service that responds very quickly (~1ms, definitley less than 10ms. Most all of my traces on sentry appear as 0.00ms. It is a simple rust axum api). It can run a test with 2,000 concurrent requests in about 200ms and that is with very small time.sleeps in the test case mentioned. However when I deploy, my response times have been anywhere between 300-700ms. Is there something I can do to speed this up?

some info:

  • it runs in a docker container

  • it is hosted in US East on railway metal

  • my ping to US East it about 60ms

  • i am using the default railway provided domain

  • the average server load is extremely low (0.0vCPU and 5-8 MB ram)

If there is any more information needed please let me know. Any help would be greatly appreciated.

Thanks!

Closed

48 Replies

adam
MODERATOR

10 months ago

When you're testing the app and getting <1ms response times, are you running it locally? If so, those response times should not be expected when running on a cloud provider such as Railway.

I'll need some more info to help you out here. First, please send your project ID so a member of the team can view your project if need.


adam
MODERATOR

10 months ago

Where are you located and what region is your service deployed in? You can view this info from the service settings on Railway


adam
MODERATOR

10 months ago

Where are you located?


adam
MODERATOR

10 months ago

Does your service communicate with any other services/databases?


colemandunn
PROOP

10 months ago

the 1ms response times come from sentry, which is tracing my deployed service. Not sure if response time is the right terminology, more like processing time? For example a response from a python service i have that hits the database has a duration (what it is called in sentry) of about 20ms.


colemandunn
PROOP

10 months ago

Utah


colemandunn
PROOP

10 months ago

no it does not, all in-memory


colemandunn
PROOP

10 months ago

looking more in sentry, they are taking about .33ms, the table is rounding down


colemandunn
PROOP

10 months ago

ok the project id is not associated with my discord account it that alright? I have been on railway for a long time on a personal account and recently we have started using it at work, but my discord is already linked to my personal railway

here it is though: 11ee2d7d-c3b6-477f-81a1-77cd0c8f449e


colemandunn
PROOP

10 months ago

thank you for the timely response!


adam
MODERATOR

10 months ago

That's not an issue


adam
MODERATOR

10 months ago

Could you please try swapping your service off of metal?


adam
MODERATOR

10 months ago

Any region is fine.


colemandunn
PROOP

10 months ago

sure thing, is it alright if i do it with a duplicate service? dont want to muddy the waters but also want to keep this service running but if i have to i will


adam
MODERATOR

10 months ago

Do what you gotta do! That's not a problem


colemandunn
PROOP

10 months ago

ok moved it to US west no metal. just running with a quick script

num_requests = 10
# Function to measure request times and calculate average
total_time = 0
for i in range(num_requests):
    start_time = time.time()
    try:
        response = requests.get(base_url, headers=headers, timeout=70)
        response_time = time.time() - start_time
        print(f"Request {i + 1}: Took {response_time:.2f} seconds")
        total_time += response_time
    except requests.exceptions.RequestException as e:
        print(f"Request {i + 1}: Failed with exception: {e}")

if num_requests > 0:
    average_time = total_time / num_requests
    print(f"\nAverage time for {num_requests} requests: {average_time:.2f} seconds")

gets me

Request 1: Took 0.19 seconds
Request 2: Took 0.17 seconds
Request 3: Took 0.17 seconds
Request 4: Took 0.17 seconds
Request 5: Took 0.17 seconds
Request 6: Took 0.18 seconds
Request 7: Took 0.17 seconds
Request 8: Took 0.17 seconds
Request 9: Took 0.18 seconds
Request 10: Took 0.17 seconds

Average time for 10 requests: 0.17 seconds

so it is looking better for sure. maybe i should have done US East non metal since that is where the other deployments are


colemandunn
PROOP

10 months ago

also here are some railway http logs for the old deployment

1358924718622769400


colemandunn
PROOP

10 months ago

and the new one (wow)

1358924807168594000


colemandunn
PROOP

10 months ago

before (on the east metal deployment) a script like this was yielding between 300ms to 700ms


colemandunn
PROOP

10 months ago

on US East non metal VVVVVV:
```Request 1: Took 0.29 seconds
Request 2: Took 0.29 seconds
Request 3: Took 0.28 seconds
Request 4: Took 0.29 seconds
Request 5: Took 0.23 seconds
Request 6: Took 0.22 seconds
Request 7: Took 0.29 seconds
Request 8: Took 0.22 seconds
Request 9: Took 0.29 seconds
Request 10: Took 0.22 seconds

Average time for 10 requests: 0.26 seconds```


colemandunn
PROOP

10 months ago

US East non metal (much larger than us west hmmm)

1358925778695356400


colemandunn
PROOP

10 months ago

what do the response times in the http logs measure exactly?


adam
MODERATOR

10 months ago

Hmm not sure. I'm going to have to tag in the team here as there may be bugs I'm not aware of with metal


adam
MODERATOR

10 months ago

!t


adam
MODERATOR

10 months ago

This thread has been escalated to the Railway team.

Status changed to Awaiting Railway Response adam 10 months ago


brody
EMPLOYEE

10 months ago

it can run a test with 2,000 concurrent requests in about 200ms

where is this number from? your local computer?

do you have the metal edge enabled on the service you have deployed to metal?


colemandunn
PROOP

10 months ago

Ya local computer, just trying to make the point the endpoint itself is very fast and does very little work


brody
EMPLOYEE

10 months ago

what computer do you have? It is very possible it has a higher per core performance than the CPUs we use on Railway as we went for higher price efficiency for our customers


colemandunn
PROOP

10 months ago

M4 pro 48GB ram. The endpoint just does one to a few hashmap lookups and returns. According to sentry it is still very fast on railway too (0.33ms duration avg)


brody
EMPLOYEE

10 months ago

Yep our CPUs on Metal do not have the same single core performance of the M4, it's just not a fair comparison, we choose 5th Gen Xeons for the cost efficiency that we pass onto the customers


colemandunn
PROOP

10 months ago

That makes sense! I don't expect the same performance, again the 2000 burst requests is just an example to illustrate that the endpoint doesn't take a lot of time. The hosted service doesn't even get near that amount of traffic, hence the 0.0 vCPU avg usage on railway with 32 allocated cores. The main thing I am wanting to get to the bottom of is the expected network latency. Sentry is reporting that the request duration is indeed very low, but like I mentioned the east metal deployment was yielding between 300ms to 700ms response times, which I doubt would be caused by a 5th Gen Xeon doing a hashmap lookup.


brody
EMPLOYEE

10 months ago

haha our routing had to be far more complex than a hashmap at our scale, but yes it's not the cause here.

Do you also have the Metal Edge enabled?


colemandunn
PROOP

10 months ago

sorry, i was referring to the work my endpoint does haha (it just does a lookup to a hashmap stored in memory, my attempt at saying i don't think the CPU is what is causing the latency here)

I do not believe I do, currently still on US East non metal for the lower latency, would i have to switch back to metal for this? Thanks!


colemandunn
PROOP

10 months ago

also could i get some clarification on this? thanks! I am assuming this includes network latency since sentry keeps reporting 0-1ms, unless sentry is wrong of course, but the stack/breadrump traces seem correct.


brody
EMPLOYEE

10 months ago

sentry is reporting function execution time, those times in the http logs record round trip time from when your request hits our edge network to when your application finishes the request.


brody
EMPLOYEE

10 months ago

So may I ask where you are located, because if you aren't east cost you will of course see increased latency due to distance


colemandunn
PROOP

10 months ago

Utah, however our production environments are on the east coast which is why this is deployed there (however this is used by our dev environments too). I just thought the latency was unusually high so wanted to check in to see if this is expected for now or if there is anything I should do on my end


brody
EMPLOYEE

10 months ago

please put the service on metal and switch on the edge network and test again


colemandunn
PROOP

10 months ago

sounds good! I'll check back in a bit thanks!


colemandunn
PROOP

10 months ago

Just to check, edge doesn't replicate your service does right? This service is meant to be stateful (in memory stateful) as a single global source of truth


colemandunn
PROOP

10 months ago

aka it is serving as a glorified redis db haha, in other words there can only be a single instance of it for it to work/be useful


brody
EMPLOYEE

10 months ago

The edge network does not replicate anything


colemandunn
PROOP

10 months ago

ok thanks, its back on east metal but I don't see where to enable edge


brody
EMPLOYEE

10 months ago

hover your mouse over the domain


colemandunn
PROOP

10 months ago

```Request 1: Took 0.39 seconds
Request 2: Took 0.20 seconds
Request 3: Took 0.28 seconds
Request 4: Took 0.26 seconds
Request 5: Took 0.20 seconds
Request 6: Took 0.20 seconds
Request 7: Took 0.20 seconds
Request 8: Took 0.20 seconds
Request 9: Took 0.18 seconds
Request 10: Took 0.20 seconds

Average time for 10 requests: 0.23 seconds```


colemandunn
PROOP

10 months ago

wow that is much better, thank you!


brody
EMPLOYEE

10 months ago

no problem!


colemandunn
PROOP

10 months ago

any idea how this one was doing so well? i know I'm closer but not THAT much closer. Could it have been a demand thing/low usage at that time?


brody
EMPLOYEE

10 months ago

not sure tbh


Status changed to Closed brody 10 months ago


Loading...