SQL Monitor 3

List of alerts

The Alert settings page lists all the alert types that SQL Monitor can raise.

Go to the Configuration tab. Under Alerts and metrics, select Alert settings:

Managing alerts

For each type of alert, you can:

  • disable it, so the alert will not be raised in future
  • change the level at which it is raised, to either low, medium, or high
  • change the thresholds that trigger the alert to be raised

You can edit the alert settings for a single SQL Server instance or across a number of instances at once (by creating a group). For job-related, disk-related or database-related alerts, you can edit the alert for a specific job, disk or database. 

When an alert is raised, you can quickly change its settings by clicking Configure alert in the Alert details page.

SQL Server specific alerts

SQL Monitor raises the following types of alerts for problems on a SQL Server instance or database:

Backup overdue

Raised when:

Either of the following conditions apply:

  • No entry for a full database backup of this database in the [msdb].[dbo].[backupset] system table.
  • The most recent entry for a full database backup of this database in the [msdb].[dbo].[backupset] system table is older than a specified time.

Configurable thresholds:

Most recent backup is older than x seconds/minutes/hours/days

Default settings:

  • Raised as Low after 7 days
  • Medium: not enabled
  • High: not enabled

Type:

Continuous

Continuous alerts can have multiple thresholds and are automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when a full backup is detected.

Possible causes:

  • No backup job scheduled.
  • Backup jobs not running or not completing. Check Job failed alerts on this SQL Server instance.
  • SQL Server Agent Service is not started - check for any SQL Server Agent Service status alerts.

Blocked process

Raised when:

A SQL connection has been waiting for another process to release its blocking lock for longer than a specified duration.

Configurable thresholds:

SQL process blocked for longer than x seconds/minutes/hours/days

Default settings:

  • Raised as Low after 20 seconds
  • Escalated to Medium after 40 seconds
  • Escalated to High after 1 minute

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when the block ends.

Possible causes:

  • Long-running queries.
  • Using Insert, Update or Delete on large numbers of records in a single transaction.
  • Canceling queries, but not rolling them back.

Cluster failover

Raised when:

The active node of a cluster changes to a different node.

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended.

Possible causes:

The previously active node has failed or been manually switched to a different node.

Database unavailable

Raised when:

The database state is something other than Online.

Configurable thresholds:

None

Default settings:

Raised as Medium

Type:

Continuous

Automatically updated from Active to Ended when the database state changes back to Online.

Possible causes:

Database has been manually removed, or has encountered a problem causing its state to change to Suspect, Emergency, Recovering or Restoring.

Deadlock

Raised when:

SQL deadlock is detected.

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended.

Possible causes:

  • Inefficient application code.
  • Application accesses objects in a different order each time.
  • User input during transactions.
  • Lengthy transactions.
  • Locks not being released as early as possible.

Deadlock trace flag disabled

Raised when:

SQL Monitor is unable to turn on the deadlock trace flag on a SQL Server instance; this means no deadlock alerts can be raised.

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Continuous

Automatically updated from Active to Ended when SQL Monitor can enable the trace flag.

Possible causes:

Insufficient privileges for the account used to connect to the SQL Server instance.

SQL Monitor requires sysadmin permissions on this account to turn on the deadlock trace flag.

Fragmented indexes

Raised when:

Both of the following conditions apply:

  • Fragmentation of one or more indexes in a database exceeds a percentage threshold.
  • The fragmented indexes contain more than a specified number of pages.

Configurable thresholds:

Percentage fragmentation level

Indexes contain more than x pages

Default settings:

  • Raised as Low when fragmentation is above 15%
  • Medium: not enabled
  • High: not enabled

    Raised for indexes with more than 1000 pages

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when there are no fragmented indexes (based on the thresholds defined) for the database.

Possible causes:

Regular deleting or updating existing rows or values in a table.

Notes:

Checking for index fragmentation is a very resource-intensive activity.

For this reason, SQL Monitor only checks for fragmented indexes once a week: on Sunday 02:00.

This means that the alert may remain Active for some time after you fix the issue, and will only be updated after the next scheduled weekly check.

Integrity check overdue

Raised when:

Either of the following conditions apply:

  • No entry for an integrity check found following DBCC DBINFO WITH TABLERESULTS.
  • The most recent entry for an integrity check is older than a specified time.

Configurable thresholds:

Most recent integrity check is older than x seconds/minutes/hours/days

Default settings:

  • Raised as Low after 7 days
  • Medium: not enabled
  • High: not enabled

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when an integrity check of the database (DBCC CHECKDB) is detected.

Possible causes:

No integrity check has been carried out on a database, or the most recent integrity check was too long ago.

Job duration unusual

Raised when:

The job execution time is different from the baseline duration (the median of the last ten runs) by a specified percentage.

Configurable thresholds:

Percentage difference from baseline duration (either slower or quicker).

Ignore jobs with run times less than x seconds.

Default settings:

These defaults assume that once the baseline for a job is established, that job should not start running significantly more slowly or more quickly.

  • Raised as Low when duration is 50% different to baseline
  • Escalated to Medium when duration is 60% different to baseline
  • Escalated to High when duration is 70% different to baseline
  • Ignore jobs that run for less than 2 seconds.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the job next completes with a duration that is within the allowed percentage of the baseline duration.

Possible causes:

  • High system load
  • Sub-optimal SQL execution plan
  • Job is waiting on an event or blocked

Notes:

SQL Monitor calculates the baseline duration by using the job history to find the last ten run times.

SQL Monitor will not raise the Job duration unusual alert until the job history contains at least ten runs.

Job failed

Raised when:

Job does not complete successfully, and returns error code.

Configurable thresholds:

None

Default settings:

Raised as Medium

Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level or update their status to Ended.

Possible causes:

Check the Job outcome message for a raised alert to help determine the problem.

Log backup overdue

Raised when:

Either of the following conditions apply:

  • No entry for a transaction log backup or a full backup of this database in the [msdb].[dbo].[backupset] system table.
  • The most recent entry for a transaction log backup or full backup of this database in the [msdb].[dbo].[backupset] system table is older than a specified time.

Configurable thresholds:

Most recent backup is older than x seconds/minutes/hours/days

Default settings:

  • Raised as Low after 1 day
  • Medium: not enabled
  • High: not enabled

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies - in this case, when a transaction log backup is detected.

Possible causes:

  • No log backup job scheduled.
  • Log backup jobs not running or not completing. Check Job failed alerts on this SQL Server instance.
  • SQL Server Agent Service is not started - check for any SQL Server Agent Service status alerts.

Long-running query

Raised when:

Query has been running for longer than a specified duration.

Configurable thresholds:

Query duration is longer than x seconds/minutes/hours/days

Do not raise alerts for queries that contain certain strings (matching specified regular expressions).

Default settings:

  • Raised as Low after 1 minute
  • Escalated to Medium after 2 minutes
  • Escalated to High after 2 minutes 10 seconds

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the query completes.

Possible causes:

  • Complex query
  • Insufficient physical memory
  • CPU over-utilized

Monitoring error (SQL Server data collection)

Raised when:

One of the following conditions applies continuously for 2 minutes:

  • Problems with WMI
  • Problems with the remote registry
  • File sharing issues
  • SQL connectivity issues

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Continuous

Automatically updated from Active to Ended when data can be collected from the instance again.

Possible causes:

  • One of the required services (WMI or remote registry) has been stopped.
  • Remote file access permissions have changed or hidden administrative shares have been disabled.
  • SQL Server Service has been stopped.

Monitoring stopped (SQL server credentials)

Raised when:

SQL Monitor is unable to collect monitoring data from the SQL Server because the credentials supplied to connect to the instance are invalid or lack permissions.

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Continuous

Automatically updated from Active to Ended once the correct credentials are entered and authentication is confirmed.

Possible causes:

  • Your user name or password has been changed.
  • Your permissions have changed and are no longer sufficient.

Page verification

Raised when:

PAGE_VERIFY is set to NONE (SQL Server 2005 or SQL Server 2008) or TORN_PAGE_DETECTION is set to FALSE (SQL Server 2000) for a database.

Configurable thresholds:

None

Default settings:

Raised as Low

Type:

Continuous

Automatically updated from Active to Ended when SQL Monitor detects that Page verification has been turned on.

Possible causes:

New databases inherit this setting from the Model database. Check that Page Verify is turned on for the Model databases, if required.

SQL Server Agent Service status

Raised when:

SQL Server Agent Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped
  • Stopped or paused
  • Started
  • Started or paused

Default settings:

Raised as Medium when service is Stopped or Paused.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed
  • Service manually stopped or started

    Check the list of component services on the relevant Windows machine.

SQL Server Analysis Service status

Raised when:

SQL Server Analysis Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped
  • Stopped or paused
  • Started
  • Started or paused

Default settings:

Raised as Medium when service is Stopped or Paused.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed
  • Service manually stopped or started

    Check the list of component services on the relevant Windows machine.

SQL Server error log entry

Raised when:

An error message has been written to the SQL Server error log with a severity level above a specified value.

Configurable thresholds:

Error severity equal to or higher than x

Default settings:

  • Raised as Low for severity level 17 or 18
  • Raised as Medium for severity level 19
  • Raised as High for severity level 20 or higher

Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended.

Possible causes:

Various

Check the SQL Server error log entry area of the alert to see the error message text.

SQL Server Full Text Search Service status

Raised when:

SQL Server Full Text Search Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped
  • Stopped or paused
  • Started
  • Started or paused

Default settings:

Raised as Medium when service is Started or Paused.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed
  • Service manually stopped or started

    Check the list of component services on the relevant Windows machine.

SQL Server instance unreachable

Raised when:

The SQL Server instance cannot be reached by SQL Monitor because it is not running, or because of some other error (other than permissions).

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Continuous

Automatically updated from Active to Ended when SQL Monitor can contact the SQL Server instance.

Possible causes:

  • Host machine unreachable
  • SQL Server service failed
  • SQL Server service manually stopped

    Check the list of component services on the relevant Windows machine. Check for Machine unreachable alerts.

SQL Server Reporting Service status

Raised when:

SQL Server Reporting Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped
  • Stopped or paused
  • Started
  • Started or paused

Default settings:

Raised as Medium when service is Stopped or Paused.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed
  • Service manually stopped or started

    Check the list of component services on the relevant Windows machine.


Host machine alerts

SQL Monitor raises the following types of alerts for problems on a host machine (Windows server):

Clock skew

Raised when:

The difference between the Base Monitor clock time and the monitored server clock time is greater than 15 seconds.

(The Base Monitor is the server which is running the monitoring service)

Configurable thresholds:

None

Note: This alert is always raised as High.

Default settings:

Time difference greater than 15 seconds. 

Type:

Continuous

Automatically updated from Active to Ended when the times are synchronized (within 15 seconds of each other).

Possible causes:

Server times across your network have not been fully synchronized.

Custom metric collection error

Raised when:

SQL Monitor has problems collecting custom metric data.

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Continuous

Automatically updated from Active to Ended when data can be collected from the database again.

Possible causes:

The most likely cause is a problem with the custom metric's T-SQL query.

Disk space

Raised when:

One of the following conditions apply, depending on how you configure the alert:

  • logical disk space used is above a percentage threshold, OR
  • logical disk space available is less than a fixed value.

Configurable thresholds:

You can configure this alert in two ways:

  • Used disk space percentage, or
  • Disk space available (in MB or GB).

    Low disk space has lasted longer than x seconds .

Default settings:

Disk space available:

  • Raised as Low when less than 1GB
  • Escalated to Medium when less than 800MB
  • Escalated to High when less than 400MB

    For longer than: 120 seconds

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when disk space is above the lowest defined threshold for at least the specified duration.

Possible causes:

  • Database and log files may be growing too large without frequent backups.
  • Other applications may be using the disk drive for file storage.

Note:

SQL Monitor only collects disk space data at 15-second intervals:

  • the minimum value for the duration is 15 seconds.
  • when you configure the alert, change the duration value by 15 second increments.

Machine unreachable

Raised when:

The Windows server (host machine) does not respond to a Ping request from SQL Monitor.

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Continuous

Automatically updated from Active to Ended when SQL Monitor can contact the host machine.

Possible causes:

  • Host machine turned off or has suffered a problem.
  • Ping request blocked by machine or a firewall.

    Check the log for the machine on the Monitored servers page.

Monitoring error (host machine data collection)

Raised when:

One of the following conditions applies continuously for 2 minutes:

  • Problems with WMI
  • Problems with the remote registry
  • File sharing issues

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Continuous

Automatically updated from Active to Ended when data can be collected from the host machine again.

Possible causes:

  • One of the required services (WMI or remote registry) has been stopped.
  • Remote file access permissions have changed or hidden administrative shares have been disabled.

Monitoring stopped (host machine credentials)

Raised when:

SQL Monitor is unable to collect monitoring data from the host machine because the credentials supplied to connect to the machine are invalid or lack permissions.

Configurable thresholds:

None

Default settings:

Raised as High

Type:

Continuous

Automatically updated from Active to Ended once the correct credentials are entered and authentication is confirmed.

Possible causes:

  • Your user name or password has been changed.
  • Your permissions have changed and are no longer sufficient.

Physical memory

Raised when:

One of the following conditions apply, depending on how you configure the alert:

  • physical memory used is above a percentage threshold, OR
  • physical memory available is less than a fixed value.

Configurable thresholds:

You can configure this alert in two ways:

  • Used physical memory percentage, or
  • Physical memory available (in MB or GB).

    Low physical memory has lasted longer than x seconds.

Default settings:

Physical memory available:

  • Raised as Low when less than 400MB available
  • Escalated to Medium when less than 200MB available
  • Escalated to High when less than 100MB available

    For longer than: 60 seconds

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when physical memory is above the lowest defined threshold for at least the specified duration.

Possible causes:

  • SQL Server has been configured with insufficient memory.
  • Page file space running low.
  • Other processes consuming physical memory.
  • Not enough RAM on server.

Note:

SQL Monitor only collects physical memory data at 15-second intervals:

  • the minimum value for the duration is 15 seconds.
  • when you configure the alert, change the duration value by 15 second increments.

Processor under-utilization

Raised when:

Total processor utilization, averaged across all CPUs, is below a percentage threshold for longer than a specified duration.

Configurable thresholds:

Processor utilization less than a specified percentage.

Under-utilization has lasted longer than x seconds.

Default settings:

Alert is disabled by default

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when processor utilization is above the lowest defined threshold for at least the specified duration.

Possible causes:

Processor utilization is not as high as expected under normal operations: may indicate that SQL Server is not running normally or processing data - freeing up CPU.

Note:

SQL Monitor only collects processor utilization data at 15-second intervals:

  • the minimum value for the duration is 15 seconds.
  • when you configure the alert, change the duration value by 15 second increments.

Processor utilization

Raised when:

Total processor utilization, averaged across all CPUs, is above a percentage threshold for longer than a specified duration.

Configurable thresholds:

Processor utilization above a specified percentage.

Utilization above this percentage has lasted longer than x seconds.

Default settings:

  • Raised as Low when utilization is above 85%
  • Escalated to Medium when above 90%
  • Escalated to High when above 95%

    For longer than: 60 seconds

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when processor utilization is above the lowest defined threshold for at least the specified duration.

Possible causes:

  • Other processes running on the server - check the System processes area of the raised alert.
  • CPU-intensive SQL queries - if Profiler trace is turned on for the SQL Server, check the SQL statements in the SQL processes/Profiler trace area of the raised alert.

Note:

SQL Monitor only collects processor utilization data at 15-second intervals:

  • the minimum value for the duration is 15 seconds.
  • when you configure the alert, change the duration value by 15 second increments.

     

SQL Server Browser Service status

Raised when:

SQL Server Browser Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped
  • Stopped or paused
  • Started
  • Started or paused

Default settings:

Raised as Medium when service is Stopped or Paused.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed
  • Service manually stopped or started

    Check the list of component services on the relevant Windows machine.

     

SQL Server Integration Service status

Raised when:

SQL Server Integration Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped
  • Stopped or paused
  • Started
  • Started or paused

Default settings:

Raised as Medium when service is Stopped or Paused.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed
  • Service manually stopped or started

    Check the list of component services on the relevant Windows machine.

     

SQL Server VSS Service status

Raised when:

SQL Server VSS Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped
  • Stopped or paused
  • Started
  • Started or paused

Default settings:

Raised as Medium when service is Stopped or Paused.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed
  • Service manually stopped or started

    Check the list of component services on the relevant Windows machine.


Do you have any feedback on this documentation?

Let us know at sqlmonitorfeedback@red-gate.com


Didn't find what you were looking for?