List of alerts
Published 28 M 2024
The Alert settings page lists all the alert types that Redgate Monitor can raise.
Go to the Configuration page. Under Alerts and metrics, select Alert settings:
For each type of alert, you can:
- disable it, so the alert will not be raised in future.
- change the level at which it is raised, to either low, medium or high.
- change the thresholds that trigger the alert to be raised.
You can edit the alert settings for a single instance or across a number of instances at once (by creating a group). For job-related, disk-related or database-related alerts, you can edit the alert for a specific job, disk or database.
When an alert is raised, you can quickly change its settings by clicking Configure alert in the Alert details page.
Alert types
Instance alerts
Alerts that all monitored instances can potentially raise. Coverage is not universal.
Backup overdue
Raised by: | SQL Server only. |
---|---|
Raised when: | Either of the following conditions apply:
If a database belongs to an availability group, the alert is only raised on the copy of the database on the primary replica. The alert is raised when all entries for a full database backup for all copies of a particular database are older than the time threshold specified for the copy of the database that is on the primary replica. |
Configurable thresholds: | Most recent backup is older than x seconds/minutes/hours/days. |
Default settings: | Raised as Medium when the last full backup is older than 7 days. |
Type: | Continuous Continuous alerts can have multiple thresholds and are automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when a full backup is detected. |
Possible causes: |
|
Blocking process
Raised by: | SQL Server only. |
---|---|
Raised when: | A SQL process has been blocking one or more other processes for longer than a specified duration. |
Configurable thresholds: | SQL process has been blocking one or more other processes for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as Low when a SQL process has been blocking one or more other processes for longer than a total of 60 seconds. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the block ends. |
Possible causes: |
|
Configuration change
Raised by: | SQL Server only. |
---|---|
Raised when: | Either:
|
Configurable thresholds: | None. |
Default settings: | Raised as Low for any monitored configuration option change. |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: | A configuration option changed on the target instance. |
Limited sampling
Raised by: | SQL Server and PostgreSQL. |
---|---|
Raised when: | Samplers have been prevented from running. |
Configurable thresholds: | None. |
Default settings: | Raised as High when sampling is limited for any reason. |
Type: | Continuous Automatically updated from Active to Ended when all samplers can be run. |
Possible causes: |
|
Database file usage
Raised by: | SQL Server. |
---|---|
Raised when: | This alert has 3 configuration modes:
|
Configurable thresholds: |
|
Default settings: | Raised as Medium when the remaining time until the database file gets full is less than 14 days. This alert is disabled by default on system databases master, model, msdb and tempdb as they are generally MBs in size and in most scenarios don't grow significantly. The alert can be configured to alert on tempdb and log file usage, but not in conjunction with alerting on remaining time as tempdb and log file usage is typically unpredictable. |
Type: | Continuous |
Possible causes: | Database file size is increasing. |
Database unavailable
Raised by: | SQL Server and PostgreSQL. |
---|---|
Raised when: | The database state is something other than Online, for longer than the time you specify. |
Configurable thresholds: | Database unavailable for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as High when the database is unavailable for longer than 30 seconds. |
Type: | Continuous Automatically updated from Active to Ended when the database state changes back to Online. |
Possible causes: | Database has been manually removed SQL Server: the database has encountered a problem causing its state to change to Suspect, Emergency, Recovering. |
Deadlock
Raised by: | SQL Server. |
---|---|
Raised when: | SQL deadlock is detected. |
Configurable thresholds: | None. |
Default settings: | Raised as Medium. |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: |
|
Long-running query
Raised by: | SQL Server, PostgreSQL. |
---|---|
Raised when: | Query has been running for longer than a specified duration. |
Configurable thresholds: | Query duration is longer than x seconds/minutes/hours/days. Do not raise alerts for queries that contain certain strings (matching specified regular expressions). |
Default settings: | Raised as Low when a query runs longer than 10 minutes. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the query completes. |
Possible causes: |
|
Monitoring error (data collection)
Raised by: | SQL Server, PostgreSQL. |
---|---|
Raised when: | One of the following conditions applies continuously for longer than a specified duration:
|
Configurable thresholds: | Redgate Monitor cannot collect instance data for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as Medium. |
Type: | Continuous Automatically updated from Active to Ended when data can be collected from the instance again. |
Possible causes: |
|
Monitoring stopped (credentials)
Raised by: | SQL Server, PostgreSQL. |
---|---|
Raised when: | Redgate Monitor is unable to collect monitoring data from the instance because the credentials supplied to connect to the instance are invalid or lack permissions. |
Configurable thresholds: | Instance cannot be contacted for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as High. |
Type: | Continuous Automatically updated from Active to Ended once the correct credentials are entered and authentication is confirmed. |
Possible causes: |
|
Instance unreachable
Raised by: | SQL Server, PostgreSQL. |
---|---|
Raised when: | The instance cannot be reached by Redgate Monitor because it is not running, or because of some other error (other than permissions), for longer than the time you specify. |
Configurable thresholds: | Instance unreachable for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as High when the instance is unavailable for longer than 30 seconds. |
Type: | Continuous Automatically updated from Active to Ended when Redgate Monitor can contact the instance. |
Possible causes: |
|
SQL Server specific alerts
Redgate Monitor raises the following types of alerts for problems on a SQL Server instance or database:
Availability group – database not healthy
Raised when: | Any of the following conditions apply, for longer than the time you specify:
|
---|---|
Configurable thresholds: | Availability group database is not healthy for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as High when a monitored availability group database is not healthy for longer than 30 seconds. |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the database is healthy again. |
Possible causes: |
|
Availability group – failover
Raised when: | The availability group fails over from a primary replica to a secondary replica. |
---|---|
Configurable thresholds: | None. |
Default settings: | Raised as Medium. |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: | Automatic failover occurs when the primary replica fails or goes offline for some reason. Planned manual failover and forced failover require initiation by a database administrator. All three types of failover will cause this alert to be raised. |
Availability group – listener offline
Raised when: | The availability group listener is offline for longer than the time you specify. |
---|---|
Configurable thresholds: | Listener is offline for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as High when a monitored availability group's listener is offline for longer than 30 seconds. |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the listener is online again. |
Possible causes: |
|
Availability group – not ready for automatic failover
Raised when: | The failover mode of the primary replica is automatic, but no secondary replicas are configured for automatic failover, for longer than the time you specify. |
---|---|
Configurable thresholds: | Availability group is not healthy for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as Medium when a monitored availability group is not healthy for longer than 30 seconds. |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the availability group replica is ready for automatic failover again. |
Possible causes: | The availability group is incorrectly configured. In order for a particular primary replica to fail over automatically when it fails or goes offline, the primary replica and target secondary replica must both be set to automatic failover mode. |
Availability group – query slowdown on primary due to synchronous replication
Raised when: | Synchronous replication causes query slowdown on the primary replica, resulting in the transaction delay exceeding a given threshold. |
---|---|
Configurable thresholds: | Transaction delay is greater than a specified value for longer than a specified time. |
Default settings: | Raised as Medium when the transaction delay is more than 10000 ms for longer than 120 seconds. |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the transaction delay falls below the given threshold. |
Possible causes: | In synchronous-commit mode, the primary replica doesn’t commit any transactions until it receives acknowledgement that all synchronous secondary replicas have finished hardening the commit in their log, resulting in transaction delay. Several factors can affect transaction delay:
|
Availability group – replica not healthy
Raised when: | Any of the following conditions apply, for longer than the time you specify:
|
---|---|
Configurable thresholds: | Replica not healthy for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as High when a monitored availability replica is not healthy for longer than 30 seconds. |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the availability group replica is healthy again. |
Possible causes: |
|
Availability group – replication falling behind
Raised when: | The sum of the log send queue and the redo queue for a particular secondary replica has been exceeding a given threshold for longer than the time you specify. |
---|---|
Configurable thresholds: | Sum of log send queue and redo queue is greater than a specified value for longer than a specified time. |
Default settings: | Raised as Medium when the sum of the log send queue and the redo queue is more than 100 MB for longer than 120 seconds. |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the sum of the log send queue and the redo queue falls below the given threshold. |
Possible causes: |
|
Cluster failover
Raised when: | The active node of a cluster changes to a different node. |
---|---|
Configurable thresholds: | None. |
Default settings: | Raised as Medium. |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: | The previously active node has failed or been manually switched to a different node. |
Deadlock trace flag disabled
Raised when: | Redgate Monitor is unable to turn on the deadlock trace flag on a SQL Server instance (1204 for SQL Server 2000, 1222 for SQL Server 2005 and later). This means deadlock alerts can't be raised. |
---|---|
Configurable thresholds: | None. |
Default settings: | Raised as High. |
Type: | Continuous Automatically updated from Active to Ended when Redgate Monitor can enable the trace flag. |
Possible causes: | Insufficient privileges for the account used to connect to the SQL Server instance. Redgate Monitor requires sysadmin permissions on this account to turn on the deadlock trace flag. |
Differential backup overdue
Raised when: | Either of the following conditions apply:
If a database belongs to an availability group, the alert is only raised on the copy of the database on the primary replica. The alert is raised when all entries for a differential backup for all copies of a particular database are older than the time threshold specified for the copy of the database that is on the primary replica. |
---|---|
Configurable thresholds: | Most recent backup is older than x seconds/minutes/hours/days |
Default settings: | Alert is disabled by default. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when a differential backup is detected. |
Possible causes: |
|
Fragmented indexes
Raised when: | Both of the following conditions apply:
|
---|---|
Configurable thresholds: | Percentage fragmentation level. Indexes contain more than x pages. |
Default settings: | Raised as Medium when index fragmentation is above 60% for indexes with more than 1000 pages. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when there are no fragmented indexes (based on the thresholds defined) for the database. |
Possible causes: | Regular deleting or updating existing rows or values in a table. |
Notes: | Checking for index fragmentation is a very resource-intensive activity. For this reason, Redgate Monitor only checks for fragmented indexes once a week: on Sunday at 01:00. This means that the alert may remain Active for some time after you fix the issue, and will only be updated after the next scheduled weekly check. |
Integrity check overdue
Raised when: | Either of the following conditions apply:
|
---|---|
Configurable thresholds: | Most recent integrity check is older than x seconds/minutes/hours/days. |
Default settings: | Alert is disabled by default. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when an integrity check of the database (DBCC CHECKDB) is detected. |
Possible causes: | No integrity check has been carried out on a database, or the most recent integrity check was too long ago. |
Job duration unusual
Raised when: | The job execution time is different from the baseline duration (the median of the last ten runs) by a specified percentage. |
---|---|
Configurable thresholds: | Percentage difference from baseline duration (either slower or quicker). Ignore jobs with run times less than x seconds. |
Default settings: | These defaults assume that once the baseline for a job is established, that job should not start running significantly more slowly or more quickly.
|
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the job next completes with a duration that is within the allowed percentage of the baseline duration. |
Possible causes: |
|
Notes: | Redgate Monitor calculates the baseline duration by using the job history to find the last ten run times. Redgate Monitor will not raise the Job duration unusual alert until the job history contains at least ten runs. |
Job failing
Raised when: | Job does not complete successfully, and returns error code. Optional: Raise alert on job step failure Configure the job failing alert to be raised if the job completes successfully, but with one or more failing steps. |
---|---|
Configurable thresholds: | None. |
Default settings: | Raised as Medium |
Type: | Continuous Automatically updated from Active to Ended when the job runs successfully again. |
Possible causes: | Check the Job outcome message for a raised alert to help determine the problem. |
Log backup overdue
Raised when: | Either of the following conditions apply:
If a database belongs to an availability group, the alert is only raised on the copy of the database on the primary replica. The alert is raised when all entries for a transaction log backup for all copies of a particular database are older than the time threshold specified for the copy of the database that is on the primary replica. |
---|---|
Configurable thresholds: | Most recent backup is older than x seconds/minutes/hours/days. |
Default settings: | Raised as Medium when the last log backup is older than 7 days. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when a transaction log backup is detected. |
Possible causes: |
|
Page Life Expectancy
Raised when: | Raised when the Page Life Expectancy drops below the user configured threshold for a specified duration. |
---|---|
Configurable thresholds: | PLE is lower than the specified value for longer than a specified time. |
Default settings: | Alert is disabled by default. |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the PLE rises above the highest defined threshold. |
Possible causes: | A drop in PLE indicates that the buffer pool has been flushed, which isn’t necessarily because of an issue. Big or persistent PLE drops can be correlated with queries that do a lot of physical reads or take big memory grants, or with scheduled SQL Server Agent jobs that perform database maintenance. |
Page verification
Raised when: | PAGE_VERIFY is set to NONE (SQL Server 2005 or SQL Server 2008) or TORN_PAGE_DETECTION is set to FALSE (SQL Server 2000) for a database. |
---|---|
Configurable thresholds: | None. |
Default settings: | Alert is disabled by default. |
Type: | Continuous Automatically updated from Active to Ended when Redgate Monitor detects that Page verification has been turned on. |
Possible causes: | New databases inherit this setting from the Model database. Check that Page Verify is turned on for the Model databases, if required. |
SQL Server Agent Service status
Raised when: | SQL Server Agent Service status matches the status specified in the alert configuration. |
---|---|
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped. |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server Analysis Service status
Raised when: | SQL Server Analysis Service status matches the status specified in the alert configuration. |
---|---|
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped. |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server error log entry
Raised when: | An error message has been written to the SQL Server error log with a severity level above a specified value. |
---|---|
Configurable thresholds: | Error severity equal to or higher than x. |
Default settings: | Raised as Medium for errors with severity level equal to or greater than 20. |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: | Various. Check the SQL Server error log entry area of the alert to see the error message text. |
SQL Server log entry
Raised when: | A message has been written to the SQL Server error log with that matches a specified regex pattern. |
---|---|
Configurable thresholds: | Message matching the specified regex pattern on the relevant threshold. |
Default settings: | Alert is disabled by default. |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: | Various. Check the SQL Server log entry area of the alert to see the error message text. |
SQL Server Full Text Search Service status
Raised when: | SQL Server Full Text Search Service status matches the status specified in the alert configuration. |
---|---|
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped. |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server Reporting Service status
Raised when: | SQL Server Reporting Service status matches the status specified in the alert configuration. |
---|---|
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped. |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
Version store usage
Raised when: | The percentage of used space in version store is above the threshold for longer than a specified duration. |
---|---|
Configurable thresholds: | Percentage of used space in version store being used for longer than x seconds. |
Default settings: | Alert is disabled by default. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the version store usage goes below the threshold. |
Possible causes: |
|
Host machine alerts
Redgate Monitor raises the following types of alerts for problems on a host machine (Windows or Linux server):
Clock skew
Raised when: | The difference between the Base Monitor clock time and the monitored server clock time is greater than 15 seconds. (The Base Monitor is the server which is running the monitoring service.) |
---|---|
Configurable thresholds: | None. Note: This alert is always raised as Medium. |
Default settings: | Time difference greater than 15 seconds. |
Type: | Continuous Automatically updated from Active to Ended when the times are synchronized (within 15 seconds of each other). |
Possible causes: | Server times across your network have not been fully synchronized. |
Custom metric collection error
Raised when: | Redgate Monitor has problems collecting custom metric data. |
---|---|
Configurable thresholds: | None. |
Default settings: | Raised as Medium. |
Type: | Continuous Automatically updated from Active to Ended when data can be collected from the database again. |
Possible causes: | The most likely cause is a problem with the custom metric's T-SQL query. |
Disk space
Raised when: | One of the following conditions applies, depending on how you configure the alert:
|
---|---|
Configurable thresholds: | You can configure this alert in either one of two ways:
|
Default settings: | Disk space available:
|
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when disk space is above the lowest defined threshold for at least the specified duration. |
Possible causes: |
|
Note: | Redgate Monitor only collects disk space data at 15-second intervals:
|
Machine unreachable
Raised when: | The Windows or Linux server (host machine) does not respond to a WMI request from Redgate Monitor for longer than the time you specify. |
---|---|
Configurable thresholds: | Machine is unreachable for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as High when the host Windows or Linux machine is unavailable for longer than 30 seconds. |
Type: | Continuous Automatically updated from Active to Ended when Redgate Monitor can contact the host machine. |
Possible causes: |
|
Monitoring error (host machine data collection)
Raised when: | One of the following conditions applies continuously for longer than the time you specify:
|
---|---|
Configurable thresholds: | Redgate Monitor cannot collect data from the host machine for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as Medium when Redgate Monitor cannot collect data from the host machine for longer than 30 seconds. |
Type: | Continuous Automatically updated from Active to Ended when data can be collected from the host machine again. |
Possible causes: |
|
Monitoring stopped (host machine credentials)
Raised when: | Redgate Monitor is unable to collect monitoring data from the host machine for longer than the time you specify, because the credentials supplied to connect to the machine are invalid or lack permissions. |
---|---|
Configurable thresholds: | Host machine cannot be contacted for longer than x seconds/minutes/hours/days. |
Default settings: | Raised as High when the host machine cannot be contacted due to an authentication failure for longer than 30 seconds. |
Type: | Continuous Automatically updated from Active to Ended once the correct credentials are entered and authentication is confirmed. |
Possible causes: |
|
Physical memory
Raised when: | One of the following conditions applies, depending on how you configure the alert:
|
---|---|
Configurable thresholds: | You can configure this alert in either one of two ways:
|
Default settings: | Alert is disabled by default. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when physical memory is above the lowest defined threshold for at least the specified duration. |
Possible causes: |
|
Note: | Redgate Monitor only collects physical memory data at 15-second intervals:
|
Processor under-utilization
Raised when: | Total processor utilization, averaged across all CPUs, is below a percentage threshold for longer than a specified duration. |
---|---|
Configurable thresholds: | Processor utilization is less than a specified percentage. Under-utilization has lasted longer than x seconds. |
Default settings: | Alert is disabled by default. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when processor utilization is above the lowest defined threshold for at least the specified duration. |
Possible causes: | Processor utilization is not as high as expected under normal operations: may indicate that SQL Server is not running normally or processing data – freeing up CPU. |
Note: | Redgate Monitor only collects processor utilization data at 15-second intervals:
|
Processor utilization
Raised when: | Total processor utilization, averaged across all CPUs, is above a percentage threshold for longer than a specified duration. |
---|---|
Configurable thresholds: | Processor utilization above a specified percentage. Utilization above this percentage has lasted longer than x seconds. |
Default settings: | Raised as Medium when processor utilization is more than 90% for longer than 600 seconds. |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when processor utilization is above the lowest defined threshold for at least the specified duration. |
Possible causes: |
|
Note: | Redgate Monitor only collects processor utilization data at 15-second intervals:
|
SQL Server Browser Service status
Raised when: | SQL Server Browser Service status matches the status specified in the alert configuration. |
---|---|
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped. |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server Integration Service status
Raised when: | SQL Server Integration Service status matches the status specified in the alert configuration. |
---|---|
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped. |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server VSS Service status
Raised when: | SQL Server VSS Service status matches the status specified in the alert configuration. |
---|---|
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped. |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|