How do the Monitoring Triggers Work?
jjao
PROOP

9 days ago

Hi all!

I had some inquiries as to how the monitoring triggers in the Observability dashboards were set up. I know they're in Beta, and are new, so wanted to clarify their inner workings, as I still couldn't find my answers in the documentation. They work well for general alerting, but I was wondering:

* Is the current alerting a (1) single-breach alert, i.e. as soon as the threshold is ever surpassed at any time, it will trigger the alert, or (2) is the default behaviour doing any sort of smart "windowed aggregation" to reduce noise, (e.g. takes the average over a minute and sees if it surpasses the threshold, sort of like what's outlined here: https://docs.datadoghq.com/monitors/types/metric/). From my testing, it seems like it might be (2), but would appreciate knowing for sure.

* If it does (2), how long is the aggregation window?

* Is it possible to (or are there future plans to allow you to) tune the monitoring alert such that it only sends the notification for sustained durations? E.g. if the CPU usage is above the threshold for over 5 minutes, only then trigger the alert? I tried looking at the roadmap for alerting, but didn't see anything on this.

My apologies if any of these are on the docs already and I just wasn't able to find them.

I made this comment on a previous thread here: https://station.railway.com/feedback/configurable-monitoring-alerts-in-observ-573b62bb, but I think it may not have been picked up as the thread was completed already, so wanted to bring it up again here.

Thanks again for all the work, this is great!

0 Replies

Loading...