Epic CPU usage spike
arjunkomath
PROOP

a month ago

I noticed this massive spike in CPU usage which I've never seen before. Is there some way I can debug what caused this?

Attachments

Solved$20 Bounty

8 Replies

Railway
BOT

a month ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


brody
EMPLOYEE

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody about 1 month ago


hey, what type of service was this? (redis, kafka, db, the repo, etc). any sort of additional info you can give or logs related to it will help us point you in the right direction.


jphillips
HOBBY

a month ago

Do you see any equivalent spike in http traffic or other variables?


yeeet

hey, what type of service was this? (redis, kafka, db, the repo, etc). any sort of additional info you can give or logs related to it will help us point you in the right direction.

arjunkomath
PROOP

a month ago

It's an Express API, I've been running this service since February 2025 and this is the first time it has spiked so much.


jphillips

Do you see any equivalent spike in http traffic or other variables?

arjunkomath
PROOP

a month ago

This is the first thing I checked and yes, there is a spike, but my service usually has a spike of requests, but the CPU scale up doesn't seem proportional to the requests (most of the spike is 4xx requests which shouldn't consume much resources), here is the graph:

Attachments


arjunkomath

It's an Express API, I've been running this service since February 2025 and this is the first time it has spiked so much.

did you see a jump in worker count in your logs right before the spike? your app getting 3k requests is normal and shouldnt remotely be hitting that high of a cpu usage. I've hit ~1m requests and my peak was 6 vCPU (and about 8gb memory).

i dont think you can retroactively go back and figure out what caused it (unless youve logged it), but you can add logs in your API to figure out what is the root cause if it happens again, so likely logging workers/CPU, adding a CPU profile hook + access logs/db pool.

if you have any logs that might be helpful, can you add them? you can see your logs for that deployment if you just go to service -> deployment and view logs

Attachments


yeeet

did you see a jump in worker count in your logs right before the spike? your app getting 3k requests is normal and shouldnt remotely be hitting that high of a cpu usage. I've hit ~1m requests and my peak was 6 vCPU (and about 8gb memory).i dont think you can retroactively go back and figure out what caused it (unless youve logged it), but you can add logs in your API to figure out what is the root cause if it happens again, so likely logging workers/CPU, adding a CPU profile hook + access logs/db pool.if you have any logs that might be helpful, can you add them? you can see your logs for that deployment if you just go to service -> deployment and view logs

arjunkomath
PROOP

a month ago

I'm very doubtful it's my app that was actually consuming all these resources, it doesn't make any sense that my container would use 60 vCPU for 9 hours.

If its just a random one off spike I won't bother creating this support ticket, 60 vCPUs for 9 hours continuously just simply doesn't make sense.

It stopped at midnight because the container was redeployed, it didn't magically fix itself. I doubt that something triggered the container to go out of control.

Someone from Railway team should look into this.


arjunkomath

I'm very doubtful it's my app that was actually consuming all these resources, it doesn't make any sense that my container would use 60 vCPU for 9 hours.If its just a random one off spike I won't bother creating this support ticket, 60 vCPUs for 9 hours continuously just simply doesn't make sense.It stopped at midnight because the container was redeployed, it didn't magically fix itself. I doubt that something triggered the container to go out of control.Someone from Railway team should look into this.

it's entirely possible that your service created extra cores and then multiplied your process. are you locking the amount of workers the service is using?

otherwise, i do find it odd that it ran for months without spiking, and suddenly its spiked, but without logs (as a person without access) i cant help much :(


Status changed to Solved arjunkomath about 1 month ago


Status changed to Solved ray-chen 23 days ago


Loading...