10 months ago
hello I have a simple service that responds very quickly (~1ms, definitley less than 10ms. Most all of my traces on sentry appear as 0.00ms. It is a simple rust axum api). It can run a test with 2,000 concurrent requests in about 200ms and that is with very small time.sleeps in the test case mentioned. However when I deploy, my response times have been anywhere between 300-700ms. Is there something I can do to speed this up?
some info:
it runs in a docker container
it is hosted in US East on railway metal
my ping to US East it about 60ms
i am using the default railway provided domain
the average server load is extremely low (0.0vCPU and 5-8 MB ram)
If there is any more information needed please let me know. Any help would be greatly appreciated.
Thanks!
48 Replies
10 months ago
When you're testing the app and getting <1ms response times, are you running it locally? If so, those response times should not be expected when running on a cloud provider such as Railway.
I'll need some more info to help you out here. First, please send your project ID so a member of the team can view your project if need.
10 months ago
Where are you located and what region is your service deployed in? You can view this info from the service settings on Railway
10 months ago
Where are you located?
10 months ago
Does your service communicate with any other services/databases?
the 1ms response times come from sentry, which is tracing my deployed service. Not sure if response time is the right terminology, more like processing time? For example a response from a python service i have that hits the database has a duration (what it is called in sentry) of about 20ms.
looking more in sentry, they are taking about .33ms, the table is rounding down
ok the project id is not associated with my discord account it that alright? I have been on railway for a long time on a personal account and recently we have started using it at work, but my discord is already linked to my personal railway
here it is though: 11ee2d7d-c3b6-477f-81a1-77cd0c8f449e
10 months ago
That's not an issue
10 months ago
Could you please try swapping your service off of metal?
10 months ago
Any region is fine.
sure thing, is it alright if i do it with a duplicate service? dont want to muddy the waters but also want to keep this service running but if i have to i will
10 months ago
Do what you gotta do! That's not a problem
ok moved it to US west no metal. just running with a quick script
num_requests = 10
# Function to measure request times and calculate average
total_time = 0
for i in range(num_requests):
start_time = time.time()
try:
response = requests.get(base_url, headers=headers, timeout=70)
response_time = time.time() - start_time
print(f"Request {i + 1}: Took {response_time:.2f} seconds")
total_time += response_time
except requests.exceptions.RequestException as e:
print(f"Request {i + 1}: Failed with exception: {e}")
if num_requests > 0:
average_time = total_time / num_requests
print(f"\nAverage time for {num_requests} requests: {average_time:.2f} seconds")gets me
Request 1: Took 0.19 seconds
Request 2: Took 0.17 seconds
Request 3: Took 0.17 seconds
Request 4: Took 0.17 seconds
Request 5: Took 0.17 seconds
Request 6: Took 0.18 seconds
Request 7: Took 0.17 seconds
Request 8: Took 0.17 seconds
Request 9: Took 0.18 seconds
Request 10: Took 0.17 seconds
Average time for 10 requests: 0.17 seconds
so it is looking better for sure. maybe i should have done US East non metal since that is where the other deployments are
before (on the east metal deployment) a script like this was yielding between 300ms to 700ms
on US East non metal VVVVVV:
```Request 1: Took 0.29 seconds
Request 2: Took 0.29 seconds
Request 3: Took 0.28 seconds
Request 4: Took 0.29 seconds
Request 5: Took 0.23 seconds
Request 6: Took 0.22 seconds
Request 7: Took 0.29 seconds
Request 8: Took 0.22 seconds
Request 9: Took 0.29 seconds
Request 10: Took 0.22 seconds
Average time for 10 requests: 0.26 seconds```
10 months ago
Hmm not sure. I'm going to have to tag in the team here as there may be bugs I'm not aware of with metal
10 months ago
!t
10 months ago
This thread has been escalated to the Railway team.
Status changed to Awaiting Railway Response adam • 10 months ago
10 months ago
it can run a test with 2,000 concurrent requests in about 200ms
where is this number from? your local computer?
do you have the metal edge enabled on the service you have deployed to metal?
Ya local computer, just trying to make the point the endpoint itself is very fast and does very little work
10 months ago
what computer do you have? It is very possible it has a higher per core performance than the CPUs we use on Railway as we went for higher price efficiency for our customers
M4 pro 48GB ram. The endpoint just does one to a few hashmap lookups and returns. According to sentry it is still very fast on railway too (0.33ms duration avg)
10 months ago
Yep our CPUs on Metal do not have the same single core performance of the M4, it's just not a fair comparison, we choose 5th Gen Xeons for the cost efficiency that we pass onto the customers
That makes sense! I don't expect the same performance, again the 2000 burst requests is just an example to illustrate that the endpoint doesn't take a lot of time. The hosted service doesn't even get near that amount of traffic, hence the 0.0 vCPU avg usage on railway with 32 allocated cores. The main thing I am wanting to get to the bottom of is the expected network latency. Sentry is reporting that the request duration is indeed very low, but like I mentioned the east metal deployment was yielding between 300ms to 700ms response times, which I doubt would be caused by a 5th Gen Xeon doing a hashmap lookup.
10 months ago
haha our routing had to be far more complex than a hashmap at our scale, but yes it's not the cause here.
Do you also have the Metal Edge enabled?
sorry, i was referring to the work my endpoint does haha (it just does a lookup to a hashmap stored in memory, my attempt at saying i don't think the CPU is what is causing the latency here)
I do not believe I do, currently still on US East non metal for the lower latency, would i have to switch back to metal for this? Thanks!
also could i get some clarification on this? thanks! I am assuming this includes network latency since sentry keeps reporting 0-1ms, unless sentry is wrong of course, but the stack/breadrump traces seem correct.
10 months ago
sentry is reporting function execution time, those times in the http logs record round trip time from when your request hits our edge network to when your application finishes the request.
10 months ago
So may I ask where you are located, because if you aren't east cost you will of course see increased latency due to distance
Utah, however our production environments are on the east coast which is why this is deployed there (however this is used by our dev environments too). I just thought the latency was unusually high so wanted to check in to see if this is expected for now or if there is anything I should do on my end
10 months ago
please put the service on metal and switch on the edge network and test again
Just to check, edge doesn't replicate your service does right? This service is meant to be stateful (in memory stateful) as a single global source of truth
aka it is serving as a glorified redis db haha, in other words there can only be a single instance of it for it to work/be useful
10 months ago
The edge network does not replicate anything
10 months ago
hover your mouse over the domain
```Request 1: Took 0.39 seconds
Request 2: Took 0.20 seconds
Request 3: Took 0.28 seconds
Request 4: Took 0.26 seconds
Request 5: Took 0.20 seconds
Request 6: Took 0.20 seconds
Request 7: Took 0.20 seconds
Request 8: Took 0.20 seconds
Request 9: Took 0.18 seconds
Request 10: Took 0.20 seconds
Average time for 10 requests: 0.23 seconds```
10 months ago
no problem!
any idea how this one was doing so well? i know I'm closer but not THAT much closer. Could it have been a demand thing/low usage at that time?
10 months ago
not sure tbh
Status changed to Closed brody • 10 months ago


