a year ago
I've been running an app for over a year already in Railway, and suddenly we started experiencing random latency spikes on the app responses. Tried improving and testing through the whole code in case we bugged something out, but there are random moments in the day when response time goes from 0.5-1 second to 16-19 seconds. It's even weirder cause we though it would be a resource usage issue, but the server still has enough resources left, and whenever there is a user spike response time / performance doesn't decrease, so it isn't a concurrent users issue either.
Is there anything going on with the service?
50 Replies
a year ago
Are you using a metal region?
a year ago
I would tell you to analyze this problem further, the team will probably ask for more data because if it were a general problem there would be more cases here
a year ago
Are you using something like uptime kuma to monitor this latency?
a year ago
also, there's a metal us-west and a gcp one
I'm measuring the backend response time as stated earlier, that's why it caught my attention
and after checking the measurements, response time went from less than 1 second to almost 19 seconds per request
Some hours ago having 500 users at the same time still kept the 1 second time
No logs, no errors, no anything. Tried even the good ol' reset just in case, even with a fresh start there is heavy lag now
a year ago
gotcha, let's wait for a team/conductor to answer this thread
Okay, now all out of a sudden it went from 22-25 seconds per response down to 3-7 seconds, still laggy, but a drastic improvement
Again, no change at all in the code itself, in fact logged users increased, so it isn't userload either
a year ago
This sounds to me like your database and backend service are in different regions. Can you please send screenshots of both? @Kuha
a year ago
Please send screenshots
a year ago
Great, that rules out metal/nonmetal
a year ago
If you have any quantifiable data, such as a grafana dashboard, please share that
a year ago
can you please try to switch both services to the v2 runtime
a year ago
within the service settings
a year ago
if this doesnt change anything we would have to recommend you setup tracing so that you can pinpoint where the "slow" is coming from
a year ago
moves the workload from docker to podman
a year ago
sounds good, I'll leave this thread open, but just know that if you see an increase in latency you will need to add tracing to your app
a year ago
Railway has no observability into what your code is or is not doing.
a year ago
@silence @Brody how did u know they weren't running in runtime v2? your admin superpowers?
a year ago
silence fail
a year ago
yes, though i didnt look at anything the user couldn't see themselves, if you see a runtime selector set to legacy then you are on legacy, if you do not see a runtime selector at all, then you are on v2, and the selector isnt there because you cannot go back to legacy
a year ago
A long time ago, lol
a year ago
As of 2024/06/04 (YYYY/MM/DD)
a year ago
It's @silent lol
a year ago
@Medim
a year ago
oh yeah
a year ago
mb
a year ago
s'all good
a year ago
!s
Status changed to Solved brody • about 1 year ago

