7 months ago
Hi Railway team,
I've been investigating our API server performance (Railway Metal hosted) and wanted some clarification on the graphed metrics.
We have a high number of replicas (instances), with configuration set to max vCPU and Memory availability.
Your documentation says:
"For services with multiple replicas, the metrics from all replicas are summed up and displayed in the metrics tab, for example, if you have 2 replicas, each using 100 MB of memory, the memory usage displayed in the metrics tab will be 200 MB" (https://docs.railway.com/guides/metrics)
However when I view the memory usage graphs in 'observability' I see only 11gb in use (less than half of what only one of our instances should have available to it.
When I view the same metric in the specific service 'metric' graph, it also shows 11GB in use and 'Max 32 GB' in the top right hand corner.
Looking at our server logs I can in-fact see that we have multiple replica's operating, as there are concurrent logs with differing replica id's.
Can you please clarify what we are seeing in the usage metric graphs, whether this number is an average across all replicas or just one instance's state, and whether we can get more information.
I'm really looking to investigate server usage spikes.
Thanks,
Harry
6 Replies
7 months ago
Hi Harry,
Each replica is allowed up to 32GB at the max, but the actual usage might not be hitting that limit. The observed 11GB reflects actual current usage across all replicas, rather than the maximum potential usage. Could you share a bit more if you think this should be indicating a higher number?
Best,
Chandrika
Status changed to Awaiting User Response Railway • 7 months ago
7 months ago
For investigating the server usage spike, and you may have already considered or tried this but mentioning in case it's helpful — you could check the logs during those times and to see if anything stands out
6 months ago
Hi Chandrika, at this point I would just really like clarification on what the graphs should be showing.
We currently have 24 instances - with 32gb and 32 vcpu allocated - so I want to be able to see what usage we've got across all instances, but the Metrics is only showing 1 instance (shows: Max 32 GB - but this should be 32 x 24 if this was across all instance?
Status changed to Awaiting Railway Response Railway • 6 months ago
6 months ago
Hello,
As Chandrika noted, the metrics shown are not for a single instance, the data in the graphs are a sum of all the 24 replicas.
Assuming your replicas are each using roughly the same amount of memory, at 11 GB total, 11 divided by 24 is about 0.460, so that is 460 MB per replica.
Yes, the graph does say 32 vCPU, but that value is the max for a single replica, since you have 24 you could theoretically use 768 vCPU.
I hope this clarifies things!
Best,
Brody
Status changed to Awaiting User Response Railway • 6 months ago
6 months ago
Thanks for the clarification, it makes sense that it would show the combined usage - doesn't really make sense for it to show the max of a single instance on a graph of the combined usage across all instances - a tooltip at the least would be helpful!
Thanks Brody
Status changed to Awaiting Railway Response Railway • 6 months ago
6 months ago
Noted, mind if you can give this feedback post an upvote? This way we can track the requests for this enhancement: https://station.railway.com/feedback/metrics-per-replica-06fdbc1f
Status changed to Awaiting User Response Railway • 6 months ago
4 months ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • 4 months ago