Help Diagnosing Nightly Response Time Spike

pedrommcarrasco

PROOP

2 years ago

Good evening,

I’m reaching out with a somewhat vague issue in the hope that someone here has experienced something similar and can offer insights or advice.

To get straight to the point, we have a PostgreSQL database connected to a Node.js API, with Redis handling background task management. Every night, starting at 11:30 PM UTC and lasting for approximately 30 minutes, we encounter a significant response time spike. During this period, our API’s average response time jumps from milliseconds to over 60 seconds, often resulting in timeouts.

This recurring issue is severely impacting performance, and I’m wondering if anyone has encountered a similar scenario. Could this be related to a PostgreSQL default configuration or some overlooked process running in the background?

Any suggestions or shared experiences would be greatly appreciated!

Thank you!

Solved

18 Replies

pedrommcarrasco

PROOP

2 years ago

871738a3-42f1-4feb-89e8-cd361440240e

mikexie360

PRO

2 years ago

If it is happening at the same time every night. It could be a CRON job or something weird with the datetime. I would check there.

pedrommcarrasco

PROOP

2 years ago

As far as we can tell, there isn’t a CRON job scheduled to run at those times—at least, not one that we configured. Could you clarify what you mean by ‘something weird with the datetime’? A bit more detail would help us understand and investigate further.

maddsua

HOBBY

2 years ago

What probably was a typo and they meant a database

maddsua

HOBBY

2 years ago

Anyhow, have you noticed any usage metrics deviations during these slowdown periods?

pedrommcarrasco

PROOP

2 years ago

According to Railway’s dashboard (covering the last 7 days) and Axiom’s concrete user usage data, there doesn’t appear to be any correlation. I’ve also attached a report from Axiom showing the average elapsed time during these spikes for further reference.

p.s.: Red is the Postgres database on Railway

1325790928505409718

1325790928824434689

brody

EMPLOYEE

2 years ago

looks like there is definitely some kind of concentrated activity, on two kinds of metrics there

pedrommcarrasco

PROOP

2 years ago

Nothing unusual on Railway; however, for reference, it’s expected for our product to see increased activity during those hours, as the majority of our users are based in the US.

1325842233177411614

brody

EMPLOYEE

2 years ago

do you have any tracing for debugging in your app?

pedrommcarrasco

PROOP

2 years ago

It’s a mobile app designed specifically for iOS and macOS. From our logs, nothing unusual stands out. However, the fact that this issue consistently occurs around the same time suggests it’s unlikely to be user-driven—especially since activity spikes to extreme levels within just 60 seconds. For example, last night, everything was functioning as expected at 11:20, but by 11:30, things suddenly went haywire

brody

EMPLOYEE

2 years ago

I shall take that as a no.

I would highly recommend adding tracing to your backend application, it will give you far more insight and help you more than anything anyone here could say

pedrommcarrasco

PROOP

2 years ago

@rubenamorim might be in a better position to answer you that, I'm sorry but I'm mostly focused on the client side of it 😅

pedrommcarrasco

PROOP

a year ago

Closing this post as we believe we've found the source of this issue. Regardless, thank you for mentioning tracing, we'll add it soon.

brody

EMPLOYEE

a year ago

well please do tell us the source of the issue!?