Redgate Monitor 14

Dynamic alerting thresholds

This feature is currently in preview and is only available for the Processor (CPU) utilization, Server waits and Query throughput alerts.

Starting from version 14.0.37, Redgate Monitor can use historical machine metrics to predict and dynamically adjust certain thresholds to generate alerts. It achieves this using machine learning models, which are stored and executed locally on the Base Monitor machines.

What is dynamic alerting?

Traditionally, Redgate Monitor uses static thresholds to raise alerts. Alerts can be enabled or disabled, and their thresholds manually adjusted, after gaining a better understanding of the applications and activity patterns on the servers.

Dynamic alerting automates this adjustment of thresholds by using machine learning algorithms to understand and forecast the behavior patterns for each metric. This ensures that your alerts are configured in a bespoke way for each machine and each metric.

All data processing to generate these machine learning forecasts is performed locally on the Base Monitor machine. No data is sent to any third party organizations or entities.

How does dynamic alerting work?

Once enabled, the machine learning models analyze historical data for each specific metric to produce a forecast for the next 24 hours. A small buffer is then added to this forecast to establish an 'expected normalcy.' If the current value of that metric exceeds the expected normalcy predicted for that time, Redgate Monitor will raise an alert.

To generate the forecasts used for dynamic alerting, at least 14 days of past usage data is required. Forecasts are generated when you first enable dynamic alerting and then at 4 AM every day thereafter. Forecasts are only generated for the metric and the machines where they are enabled.

After enabling dynamic alerts, a chart will be displayed showing the expected normalcy for the day (black line).

Alert Settings - Processor (CPU) Utilization and Server Waits

The hatched red region indicates the threshold for when an alert will be raised. This region can be configured by adjusting the minimum threshold option, which is set at 80% by default to avoid noisy alerts.

For server waits, the unit of the minimum threshold can also be adjusted, just like for the standard alert. The forecast chart will adjust accordingly.

The duration the metric must stay in that region before an alert is raised can also be adjusted. The default duration is set at 600 seconds for CPU Utilization in order to avoid alerts on short spikes. An alert will be sent if the metric falls within this red region for the set duration.

Checking the "Raise a low severity alert (...)" box will make it so a low severity alert is raised even though the values are under the percentage threshold, as long as they are above the predicted values.


Alert Settings - Query Throughput

The hatched blue region indicates the threshold for when an alert will be raised. The forecasted query throughput is shown on the chart as the Expected normalcy. 

The Sensitivity slider can be adjusted to tweak the alert threshold. A higher sensitivity brings the alert threshold closer to the forecast and therefore increases the chances of an alert, whereas a lower sensitivity pushes the alert threshold away from the forecast, therefore decreasing the chances of an alert.

How to enable dynamic alerting?

  1. Go to Configuration > Alert Settings
  2. Select either the Processor (CPU) utilization, Server waits or Query throughput alert
  3. On the Alert Settings page, choose the host machine (SQL server instance for Server waits) or the group you want to enable dynamic alerting for. The same hierarchy principles of Customizing the alert settings apply here.
  4. Select the Customize settings for this level and all levels below radio button
  5. Turn on the Use dynamic alert thresholds toggle if you are configuring Processor (CPU) utilization or Server waits (the Query throughput alert automatically uses dynamic alerting)
  6. Once the forecast is generated, you can adjust the threshold and duration options using the input boxes
  7. Click Apply changes to save the settings

When an alert is triggered using Dynamic Thresholds, it will be marked with a 'Dynamic alert' label in the Alert Inbox.



Dynamic thresholds are expected to generate fewer alerts compared to static thresholds, helping to reduce unnecessary alerts and highlight significant performance issues. To determine the effectiveness of these alerts, we have included a feedback button with each alert. Your feedback is valuable and will help us improve dynamic alerts. After submitting feedback, you will also see a link to a survey for more detailed input. We are especially interested in knowing if enabling dynamic alerts has any negative impact on identifying performance issues, either by not raising alerts for actual issues or by raising alerts when there isn't a significant performance problem.


Considerations

Forecasts are generated when dynamic alerting is enabled and at 4 AM each day thereafter.

A minimum of 14 days of historical metric data is required for generating forecasts.

Generating forecasts should not significantly impact the Base Monitor's performance. It operates as a low-priority task on the Base Monitor machine. There is a two-hour timeout window for generating forecasts, after which the process is terminated until the next day.

If the historical data is unavailable, or if the forecast generation fails for any reason, the minimum threshold set in the dynamic threshold configuration will be used instead.

While in preview, it is recommended to initially enable dynamic alerts only for individual machines/SQL servers, rather than entire groups.


Didn't find what you were looking for?