Increased CPU usage with Metal Server
anujsilkroute
PROOP

6 months ago

I have migrated my encoding_server which runs a sentence transformer to convert arrays into vectors. With the switch to metal servers, the speed has decreased and the CPU usage has gone up

Solved

7 Replies

echohack
EMPLOYEE

6 months ago

Heya,

I don't think this is due to the Metal migration (as you can see, you migrated back to US-West-1 and continue to have spikey traffic). I think that you just have an increased demand for your application's usage during this time.

Please note that your application will be migrated to metal automatically if do you do not do so yourself, as we are deprecating our GCP infrastructure soon. So I'd advise migrating back to Metal as that will be more convenient if you happen to incur any downtime.


Status changed to Awaiting User Response Railway 7 months ago


anujsilkroute
PROOP

6 months ago

Hey,

Thanks for the response.

From what I have seen, with the metal server the responses are taking much longer and to update a 450ish long vector database catalog takes 20 minutes. Where as when I switch the server back to US Oregon Legacy it takes 3 minutes to process.

This also reflects onto the CPU usages as the usage in the legacy server is a couple of spikes whereas with the metal server it is a constant use of all CPU cores.


Status changed to Awaiting Railway Response Railway 7 months ago


jake
EMPLOYEE

6 months ago

20 minutes vs 3 minutes sounds like you're probably round tripping something unnecessarily. That's a 10x penalty; no shot that's CPU

Have you checked:
- That you're using the private network
- That all your databases/applications are in the same region

- That you're using the metal edge

I notice you're using Supabase. It'll likely be much faster if you move your Postgres Database to Railway. Can we help you with that?


Status changed to Awaiting User Response Railway 7 months ago


anujsilkroute
PROOP

6 months ago

Hey,

So by simply changing the region of the server from oregon legacy to california metal has caused this increase. Not sure if anything else is causing it.

I am using the internal private network, and my other server container is already in california metal, I haven't tried switching to metal edge yet.

We are already using the railway postgres database, the supabase was just left over code that needs to be removed during refactor.


Status changed to Awaiting Railway Response Railway 7 months ago


jake
EMPLOYEE

6 months ago

Gotchya. How do you connect to the Railway Postgres? Like, which envvar are you using? I'm wondering if your'e accidently using the public network which is causing this slowdown

Would make sense because, you probably haven't been migrated to the metal edge so it's going ALL the way back to Oregon then back through our network backbone


Status changed to Awaiting User Response Railway 7 months ago


anujsilkroute
PROOP

6 months ago

Just for reference the issue I'm having is that I have my main server container, and I have a smaller server container that I run a sentence encoding llm on. The encoding server runs faster when it is connected to oregon and slower on the metal California instance. The communication between my data_sync script and the encoding_server is the context behind the cpu usage and the response time. And the cpu usage is a byproduct of the response time taking so long.


Status changed to Awaiting Railway Response Railway 7 months ago


jake
EMPLOYEE

6 months ago

Hmm. Where are your external services located? Could you try moving them to say, US East? It looks like that's where your GCP services might be connected

I have a feeling this could potentially be upstream


Status changed to Awaiting User Response Railway 7 months ago


Railway
BOT

4 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 4 months ago


Loading...