List of alerts
Published 08 July 2015
The Alert settings page lists all the alert types that SQL Monitor can raise.
Go to the Configuration tab. Under Alerts and metrics, select Alert settings:
Managing alerts
For each type of alert, you can:
- disable it, so the alert will not be raised in future
- change the level at which it is raised, to either low, medium, or high
- change the thresholds that trigger the alert to be raised
You can edit the alert settings for a single SQL Server instance or across a number of instances at once (by creating a group). For job-related, disk-related or database-related alerts, you can edit the alert for a specific job, disk or database.
When an alert is raised, you can quickly change its settings by clicking Configure alert in the Alert details page.
SQL Server specific alerts
SQL Monitor raises the following types of alerts for problems on a SQL Server instance or database:
Availability group - database not healthy
Raised when: | Any of the following conditions apply, for longer than the time you specify:
|
Configurable thresholds: | Availability group database is not healthy for longer than x seconds/minutes/hours/days |
Default settings: | Raised as High when a monitored availability group database is not healthy for longer than 30 seconds |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the database is healthy again. |
Possible causes: |
|
Availability group - failover
Raised when: | The availability group fails over from a primary replica to a secondary replica |
Configurable thresholds: | None |
Default settings: | Raised as Medium |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: | Automatic failover occurs when the primary replica fails or goes offline for some reason. Planned manual failover and forced failover require initiation by a database administrator. All three types of failover will cause this alert to be raised. |
Availability group - listener offline
Raised when: | The availability group listener is offline for longer than the time you specify |
Configurable thresholds: | Listener is offline for longer than x seconds/minutes/hours/days |
Default settings: | Raised as High when a monitored availability group's listener is offline for longer than 30 seconds |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the listener is online again. |
Possible causes: |
|
Availability group - not ready for automatic failover
Raised when: | The failover mode of the primary replica is automatic, but no secondary replicas are configured for automatic failover, for longer than the time you specify |
Configurable thresholds: | Availability group is not healthy for longer than x seconds/minutes/hours/days |
Default settings: | Raised as Medium when a monitored availability group is not healthy for longer than 30 seconds |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the availability group replica is ready for automatic failover again. |
Possible causes: | The availability group is incorrectly configured. In order for a particular primary replica to fail over automatically when it fails or goes offline, the primary replica and target secondary replica must both be set to automatic failover mode. |
Availability group - query slowdown on primary due to synchronous replication
Raised when: | Synchronous replication causes query slowdown on the primary replica, resulting in the transaction delay exceeding a given threshold |
Configurable thresholds: | Transaction delay is greater than a specified value for longer than a specified time |
Default settings: | Raised as Medium when the transaction delay is more than 10000 ms for longer than 120 seconds |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the transaction delay falls below the given threshold. |
Possible causes: | In synchronous-commit mode, the primary replica doesn’t commit any transactions until it receives acknowledgement that all synchronous secondary replicas have finished hardening the commit in their log, resulting in transaction delay. Several factors can affect transaction delay:
|
Availability group - replica not healthy
Raised when: | Any of the following conditions apply, for longer than the time you specify:
|
Configurable thresholds: | Replica not healthy for longer than x seconds/minutes/hours/days |
Default settings: | Raised as High when a monitored availability replica is not healthy for longer than 30 seconds |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the availability group replica is healthy again. |
Possible causes: |
|
Availability group - replication falling behind
Raised when: | The sum of the log send queue and the redo queue for a particular secondary replica has been exceeding a given threshold for longer than the time you specify |
Configurable thresholds: | Sum of log send queue and redo queue is greater than a specified value for longer than a specified time |
Default settings: | Raised as Medium when the sum of the log send queue and the redo queue is more than 100 MB for longer than 120 seconds |
Type: | Continuous Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the sum of the log send queue and the redo queue falls below the given threshold. |
Possible causes: |
|
Backup overdue
Raised when: | Either of the following conditions apply:
If a database belongs to an availability group, the alert is only raised on the copy of the database on the primary replica. The alert is raised when all entries for a full database backup for all copies of a particular database are older than the time threshold specified for the copy of the database that is on the primary replica. |
Configurable thresholds: | Most recent backup is older than x seconds/minutes/hours/days |
Default settings: | Raised as Medium when the last full backup is older than 7 days |
Type: | Continuous Continuous alerts can have multiple thresholds and are automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when a full backup is detected. |
Possible causes: |
|
Blocking process
Raised when: | A SQL process has been blocking one or more other processes for longer than a specified duration. |
Configurable thresholds: | SQL process has been blocking one or more other processes for longer than x seconds/minutes/hours/days |
Default settings: | Raised as Low when a SQL process has been blocking one or more other processes for longer than a total of 60 seconds |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when the block ends. |
Possible causes: |
|
Cluster failover
Raised when: | The active node of a cluster changes to a different node. |
Configurable thresholds: | None |
Default settings: | Raised as Medium |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: | The previously active node has failed or been manually switched to a different node. |
Database unavailable
Raised when: | The database state is something other than Online, for longer than the time you specify |
Configurable thresholds: | Database unavailable for longer than x seconds/minutes/hours/days |
Default settings: | Raised as High when the database is unavailable for longer than 30 seconds |
Type: | Continuous Automatically updated from Active to Ended when the database state changes back to Online. |
Possible causes: | Database has been manually removed, or has encountered a problem causing its state to change to Suspect, Emergency, Recovering or Restoring. |
Deadlock
Raised when: | SQL deadlock is detected. |
Configurable thresholds: | None |
Default settings: | Raised as Medium |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: |
|
Deadlock trace flag disabled
Raised when: | SQL Monitor is unable to turn on the deadlock trace flag on a SQL Server instance (1204 for SQL Server 2000, 1222 for SQL Server 2005 and later). This means deadlock alerts can't be raised. |
Configurable thresholds: | None |
Default settings: | Raised as High |
Type: | Continuous Automatically updated from Active to Ended when SQL Monitor can enable the trace flag. |
Possible causes: | Insufficient privileges for the account used to connect to the SQL Server instance. SQL Monitor requires sysadmin permissions on this account to turn on the deadlock trace flag. |
Fragmented indexes
Raised when: | Both of the following conditions apply:
|
Configurable thresholds: | Percentage fragmentation level Indexes contain more than x pages |
Default settings: | Raised as Medium when index fragmentation is above 60% for indexes with more than 1000 pages |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when there are no fragmented indexes (based on the thresholds defined) for the database. |
Possible causes: | Regular deleting or updating existing rows or values in a table. |
Notes: | Checking for index fragmentation is a very resource-intensive activity. For this reason, SQL Monitor only checks for fragmented indexes once a week: on Sunday 02:00. This means that the alert may remain Active for some time after you fix the issue, and will only be updated after the next scheduled weekly check. |
Integrity check overdue
Raised when: | Either of the following conditions apply:
|
Configurable thresholds: | Most recent integrity check is older than x seconds/minutes/hours/days |
Default settings: | Alert is disabled by default |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when an integrity check of the database (DBCC CHECKDB) is detected. |
Possible causes: | No integrity check has been carried out on a database, or the most recent integrity check was too long ago. |
Job duration unusual
Raised when: | The job execution time is different from the baseline duration (the median of the last ten runs) by a specified percentage. |
Configurable thresholds: | Percentage difference from baseline duration (either slower or quicker). Ignore jobs with run times less than x seconds. |
Default settings: | These defaults assume that once the baseline for a job is established, that job should not start running significantly more slowly or more quickly.
|
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the job next completes with a duration that is within the allowed percentage of the baseline duration. |
Possible causes: |
|
Notes: | SQL Monitor calculates the baseline duration by using the job history to find the last ten run times. SQL Monitor will not raise the Job duration unusual alert until the job history contains at least ten runs. |
Job failing
Raised when: | Job does not complete successfully, and returns error code. |
Configurable thresholds: | None |
Default settings: | Raised as Medium |
Type: | Continuous Automatically updated from Active to Ended when the job runs successfully again, or when the job runs again and fails (in which case the original alert will be marked as Ended and a new alert will be raised). |
Possible causes: | Check the Job outcome message for a raised alert to help determine the problem. |
Log backup overdue
Raised when: | Either of the following conditions apply:
If a database belongs to an availability group, the alert is only raised on the copy of the database on the primary replica. The alert is raised when all entries for a transaction log backup for all copies of a particular database are older than the time threshold specified for the copy of the database that is on the primary replica. |
Configurable thresholds: | Most recent backup is older than x seconds/minutes/hours/days |
Default settings: | Raised as Medium when the last log backup is older than 7 days |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when a transaction log backup is detected. |
Possible causes: |
|
Long-running query
Raised when: | Query has been running for longer than a specified duration. |
Configurable thresholds: | Query duration is longer than x seconds/minutes/hours/days Do not raise alerts for queries that contain certain strings (matching specified regular expressions). |
Default settings: | Raised as Low when a query runs longer than 10 minutes |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when the query completes. |
Possible causes: |
|
Monitoring error (SQL Server data collection)
Raised when: | One of the following conditions applies continuously for longer than a specified duration:
|
Configurable thresholds: | SQL Monitor cannot collect SQL Server data for longer than x seconds/minutes/hours/days |
Default settings: | Raised as Medium |
Type: | Continuous Automatically updated from Active to Ended when data can be collected from the instance again. |
Possible causes: |
|
Monitoring stopped (SQL server credentials)
Raised when: | SQL Monitor is unable to collect monitoring data from the SQL Server because the credentials supplied to connect to the instance are invalid or lack permissions. |
Configurable thresholds: | SQL Server instance cannot be contacted for longer than x seconds/minutes/hours/days |
Default settings: | Raised as High |
Type: | Continuous Automatically updated from Active to Ended once the correct credentials are entered and authentication is confirmed. |
Possible causes: |
|
Page verification
Raised when: | PAGE_VERIFY is set to NONE (SQL Server 2005 or SQL Server 2008) or TORN_PAGE_DETECTION is set to FALSE (SQL Server 2000) for a database. |
Configurable thresholds: | None |
Default settings: | Alert is disabled by default |
Type: | Continuous Automatically updated from Active to Ended when SQL Monitor detects that Page verification has been turned on. |
Possible causes: | New databases inherit this setting from the Model database. Check that Page Verify is turned on for the Model databases, if required. |
SQL Server Agent Service status
Raised when: | SQL Server Agent Service status matches the status specified in the alert configuration. |
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server Analysis Service status
Raised when: | SQL Server Analysis Service status matches the status specified in the alert configuration. |
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server error log entry
Raised when: | An error message has been written to the SQL Server error log with a severity level above a specified value. |
Configurable thresholds: | Error severity equal to or higher than x |
Default settings: | Raised as Medium for errors with severity level equal to or greater than 20 |
Type: | Event Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended. |
Possible causes: | Various Check the SQL Server error log entry area of the alert to see the error message text. |
SQL Server Full Text Search Service status
Raised when: | SQL Server Full Text Search Service status matches the status specified in the alert configuration. |
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server instance unreachable
Raised when: | The SQL Server instance cannot be reached by SQL Monitor because it is not running, or because of some other error (other than permissions), for longer than the time you specify. |
Configurable thresholds: | SQL Server instance unreachable for longer than x seconds/minutes/hours/days |
Default settings: | Raised as High when the SQL Server instance is unavailable for longer than 30 seconds |
Type: | Continuous Automatically updated from Active to Ended when SQL Monitor can contact the SQL Server instance. |
Possible causes: |
|
SQL Server Reporting Service status
Raised when: | SQL Server Reporting Service status matches the status specified in the alert configuration. |
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
Host machine alerts
SQL Monitor raises the following types of alerts for problems on a host machine (Windows server):
Clock skew
Raised when: | The difference between the Base Monitor clock time and the monitored server clock time is greater than 15 seconds. (The Base Monitor is the server which is running the monitoring service) |
Configurable thresholds: | None Note: This alert is always raised as Medium. |
Default settings: | Time difference greater than 15 seconds. |
Type: | Continuous Automatically updated from Active to Ended when the times are synchronized (within 15 seconds of each other). |
Possible causes: | Server times across your network have not been fully synchronized. |
Custom metric collection error
Raised when: | SQL Monitor has problems collecting custom metric data. |
Configurable thresholds: | None |
Default settings: | Raised as Medium |
Type: | Continuous Automatically updated from Active to Ended when data can be collected from the database again. |
Possible causes: | The most likely cause is a problem with the custom metric's T-SQL query. |
Disk space
Raised when: | One of the following conditions apply, depending on how you configure the alert:
|
Configurable thresholds: | You can configure this alert in two ways:
|
Default settings: | Disk space available:
|
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when disk space is above the lowest defined threshold for at least the specified duration. |
Possible causes: |
|
Note: | SQL Monitor only collects disk space data at 15-second intervals:
|
Machine unreachable
Raised when: | The Windows server (host machine) does not respond to a Ping request from SQL Monitor for longer than the time you specify. |
Configurable thresholds: | Machine is unreachable for longer than x seconds/minutes/hours/days |
Default settings: | Raised as High when the host Windows machine is unavailable for longer than 30 seconds |
Type: | Continuous Automatically updated from Active to Ended when SQL Monitor can contact the host machine. |
Possible causes: |
|
Monitoring error (host machine data collection)
Raised when: | One of the following conditions applies continuously for longer than the time you specify:
|
Configurable thresholds: | SQL Monitor cannot collect data from the host machine for longer than x seconds/minutes/hours/days |
Default settings: | Raised as Medium when SQL Monitor cannot collect data from the host machine for longer than 30 seconds |
Type: | Continuous Automatically updated from Active to Ended when data can be collected from the host machine again. |
Possible causes: |
|
Monitoring stopped (host machine credentials)
Raised when: | SQL Monitor is unable to collect monitoring data from the host machine for longer than the time you specify, because the credentials supplied to connect to the machine are invalid or lack permissions. |
Configurable thresholds: | Host machine cannot be contacted for longer than x seconds/minutes/hours/days |
Default settings: | Raised as High when the host machine cannot be contacted due to an authentication failure for longer than 30 seconds |
Type: | Continuous Automatically updated from Active to Ended once the correct credentials are entered and authentication is confirmed. |
Possible causes: |
|
Physical memory
Raised when: | One of the following conditions apply, depending on how you configure the alert:
|
Configurable thresholds: | You can configure this alert in two ways:
|
Default settings: | Alert is disabled by default |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when physical memory is above the lowest defined threshold for at least the specified duration. |
Possible causes: |
|
Note: | SQL Monitor only collects physical memory data at 15-second intervals:
|
Processor under-utilization
Raised when: | Total processor utilization, averaged across all CPUs, is below a percentage threshold for longer than a specified duration. |
Configurable thresholds: | Processor utilization less than a specified percentage. Under-utilization has lasted longer than x seconds. |
Default settings: | Alert is disabled by default |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when processor utilization is above the lowest defined threshold for at least the specified duration. |
Possible causes: | Processor utilization is not as high as expected under normal operations: may indicate that SQL Server is not running normally or processing data - freeing up CPU. |
Note: | SQL Monitor only collects processor utilization data at 15-second intervals:
|
Processor utilization
Raised when: | Total processor utilization, averaged across all CPUs, is above a percentage threshold for longer than a specified duration. |
Configurable thresholds: | Processor utilization above a specified percentage. Utilization above this percentage has lasted longer than x seconds. |
Default settings: | Raised as Medium when processor utilization is more than 90% for longer than 600 seconds |
Type: | Continuous Can have multiple thresholds applied and is automatically updated from Active to Ended when processor utilization is above the lowest defined threshold for at least the specified duration. |
Possible causes: |
|
Note: | SQL Monitor only collects processor utilization data at 15-second intervals:
|
SQL Server Browser Service status
Raised when: | SQL Server Browser Service status matches the status specified in the alert configuration. |
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server Integration Service status
Raised when: | SQL Server Integration Service status matches the status specified in the alert configuration. |
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|
SQL Server VSS Service status
Raised when: | SQL Server VSS Service status matches the status specified in the alert configuration. |
Configurable thresholds: | Service status is one of the following:
|
Default settings: | Raised as Medium when service is Stopped |
Type: | Continuous Automatically updated from Active to Ended when the service status changes to a status other than that specified. |
Possible causes: |
|