Epic CPU usage spike

7 months ago

I noticed this massive spike in CPU usage which I've never seen before. Is there some way I can debug what caused this?

Attachments

Solved$20 Bounty

Pinned Solution

7 months ago

it's entirely possible that your service created extra cores and then multiplied your process. are you locking the amount of workers the service is using?

otherwise, i do find it odd that it ran for months without spiking, and suddenly its spiked, but without logs (as a person without access) i cant help much :(

8 Replies

Railway
BOT

7 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


7 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody 7 months ago


7 months ago

hey, what type of service was this? (redis, kafka, db, the repo, etc). any sort of additional info you can give or logs related to it will help us point you in the right direction.


jphillips
HOBBY

7 months ago

Do you see any equivalent spike in http traffic or other variables?


yeeet

hey, what type of service was this? (redis, kafka, db, the repo, etc). any sort of additional info you can give or logs related to it will help us point you in the right direction.

7 months ago

It's an Express API, I've been running this service since February 2025 and this is the first time it has spiked so much.


jphillips

Do you see any equivalent spike in http traffic or other variables?

7 months ago

This is the first thing I checked and yes, there is a spike, but my service usually has a spike of requests, but the CPU scale up doesn't seem proportional to the requests (most of the spike is 4xx requests which shouldn't consume much resources), here is the graph:

Attachments


arjunkomath

It's an Express API, I've been running this service since February 2025 and this is the first time it has spiked so much.

7 months ago

did you see a jump in worker count in your logs right before the spike? your app getting 3k requests is normal and shouldnt remotely be hitting that high of a cpu usage. I've hit ~1m requests and my peak was 6 vCPU (and about 8gb memory).

i dont think you can retroactively go back and figure out what caused it (unless youve logged it), but you can add logs in your API to figure out what is the root cause if it happens again, so likely logging workers/CPU, adding a CPU profile hook + access logs/db pool.

if you have any logs that might be helpful, can you add them? you can see your logs for that deployment if you just go to service -> deployment and view logs

Attachments


yeeet

did you see a jump in worker count in your logs right before the spike? your app getting 3k requests is normal and shouldnt remotely be hitting that high of a cpu usage. I've hit \~1m requests and my peak was 6 vCPU (and about 8gb memory). i dont think you can retroactively go back and figure out what caused it (unless youve logged it), but you can add logs in your API to figure out what is the root cause if it happens again, so likely logging workers/CPU, adding a CPU profile hook + access logs/db pool. if you have any logs that might be helpful, can you add them? you can see your logs for that deployment if you just go to service -> deployment and view logs![](https://station-server.railway.com/attachments/att_01k8yjcga0fersn2y9aa7ekywk)

7 months ago

I'm very doubtful it's my app that was actually consuming all these resources, it doesn't make any sense that my container would use 60 vCPU for 9 hours.

If its just a random one off spike I won't bother creating this support ticket, 60 vCPUs for 9 hours continuously just simply doesn't make sense.

It stopped at midnight because the container was redeployed, it didn't magically fix itself. I doubt that something triggered the container to go out of control.

Someone from Railway team should look into this.


arjunkomath

I'm very doubtful it's my app that was actually consuming all these resources, it doesn't make any sense that my container would use 60 vCPU for **9 hours**. If its just a random one off spike I won't bother creating this support ticket, **60 vCPUs for 9 hours continuously just simply doesn't make sense**. It stopped at midnight because **the container was redeployed, it didn't magically fix itself**. I doubt that something triggered the container to go out of control. Someone from Railway team should look into this. ![](https://station-server.railway.com/attachments/att_01k8yjky8ze4brxhxpc9ptdj44) ![](https://station-server.railway.com/attachments/att_01k8yjkp97fvpv63yf28mm6t1r)

7 months ago

it's entirely possible that your service created extra cores and then multiplied your process. are you locking the amount of workers the service is using?

otherwise, i do find it odd that it ran for months without spiking, and suddenly its spiked, but without logs (as a person without access) i cant help much :(


Status changed to Solved ray-chen 6 months ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...