List of alerts

The Alert settings page lists all the alert types that SQL Monitor can raise.

Go to the Configuration page. Under Alerts and metrics, select Alert settings:

For each type of alert, you can:

  • disable it, so the alert will not be raised in future.
  • change the level at which it is raised, to either low, medium or high.
  • change the thresholds that trigger the alert to be raised.

You can edit the alert settings for a single instance or across a number of instances at once (by creating a group). For job-related, disk-related or database-related alerts, you can edit the alert for a specific job, disk or database. 

When an alert is raised, you can quickly change its settings by clicking Configure alert in the Alert details page.

Alert types

Instance alerts

Alerts that all monitored instances can potentially raise. Coverage is not universal. 

Backup overdue

Raised by: 

SQL Server only.

Raised when:

Either of the following conditions apply:

  • No entry for a full database backup of this database in the [msdb].[dbo].[backupset] system table.
  • The most recent entry for a full database backup of this database in the [msdb].[dbo].[backupset] system table is older than a specified time.

If a database belongs to an availability group, the alert is only raised on the copy of the database on the primary replica. The alert is raised when all entries for a full database backup for all copies of a particular database are older than the time threshold specified for the copy of the database that is on the primary replica.

Configurable thresholds:

Most recent backup is older than x seconds/minutes/hours/days.

Default settings:

Raised as Medium when the last full backup is older than 7 days.

Type:

Continuous

Continuous alerts can have multiple thresholds and are automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when a full backup is detected.

Possible causes:

  • No backup job scheduled.
  • Backup jobs not running or not completing. Check Job failing alerts on this SQL Server instance.
  • SQL Server Agent Service is not started. Check for any SQL Server Agent Service status alerts.


Blocking process

Raised by: 

SQL Server only.

Raised when:

A SQL process has been blocking one or more other processes for longer than a specified duration.

Configurable thresholds:

SQL process has been blocking one or more other processes for longer than x seconds/minutes/hours/days.

Default settings:

Raised as Low when a SQL process has been blocking one or more other processes for longer than a total of 60 seconds.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the block ends.

Possible causes:

  • Long-running queries.
  • Using Insert, Update or Delete on large numbers of records in a single transaction.
  • Canceling queries, but not rolling them back.

Configuration change

Raised by: 

SQL Server only.

Raised when:

Either:

  • Any configuration option monitored by SQL Monitor changes.
  • The above, taking into account overrides specified. For example, if the only configuration change is to an option with an override set to "Don't alert" then the alert won't raise.

Configurable thresholds:

None.

Default settings:

Raised as Low for any monitored configuration option change.

Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended.

Possible causes:

A configuration option changed on the target instance.

Limited sampling

Raised by:

SQL Server and PostgreSQL.

Raised when:

Samplers have been prevented from running.

Configurable thresholds:

None.

Default settings:

Raised as High when sampling is limited for any reason.

Type:

Continuous

Automatically updated from Active to Ended when all samplers can be run.

Possible causes:

  • SQL Server: deadlock trace flag is disabled and extended events support has not been enabled.
  • SQL Server: you're running a version of SQL Server 2008 susceptible to a bug that causes high CPU usage when reading error logs.
  • PostgreSQL: track_timing, pg_stat_statements or another required extension is turned off or not installed.

Database file usage

Raised by:

SQL Server.

Raised when:

This alert has 3 configuration modes:

  • Percent Full: raised when space used in a database file is above a specified percentage.
  • Space Remaining: raised when the available space is less than a fixed amount.
  • Time Remaining: raised when the remaining time until the database file fills is less than a specified time.

Configurable thresholds:

  • Used database file space is more than x percent of the total file size.
  • Remaining database file space is less that x MB/GB.
  • Remaining time until the database file gets full is less than x days.

Default settings:

Raised as Medium when the remaining time until the database file gets full is less than 14 days.

This alert is disabled by default on system databases master, model, msdb and tempdb as they are generally MBs in size and in most scenarios don't grow significantly. The alert can be configured to alert on tempdb and log file usage, but not in conjunction with alerting on remaining time as tempdb and log file usage is typically unpredictable.

Type:

Continuous

Possible causes:

Database file size is increasing.

Database unavailable

Raised by:

SQL Server and PostgreSQL.

Raised when:

The database state is something other than Online, for longer than the time you specify.

Configurable thresholds:

Database unavailable for longer than x seconds/minutes/hours/days.

Default settings:

Raised as High when the database is unavailable for longer than 30 seconds.

Type:

Continuous

Automatically updated from Active to Ended when the database state changes back to Online.

Possible causes:

Database has been manually removed

SQL Server: the database has encountered a problem causing its state to change to Suspect, Emergency, Recovering.

Deadlock

Raised by:

SQL Server.

Raised when:

SQL deadlock is detected.

Configurable thresholds:

None.

Default settings:

Raised as Medium.

Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended.

Possible causes:

  • Inefficient application code.
  • Application accesses objects in a different order each time.
  • User input during transactions.
  • Lengthy transactions.
  • Locks not being released as early as possible.

Long-running query

Raised by:

SQL Server, PostgreSQL.

Raised when:

Query has been running for longer than a specified duration.

Configurable thresholds:

Query duration is longer than x seconds/minutes/hours/days.

Do not raise alerts for queries that contain certain strings (matching specified regular expressions).

Default settings:

Raised as Low when a query runs longer than 10 minutes.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the query completes.

Possible causes:

  • Complex query.
  • Insufficient physical memory.
  • CPU over-utilized.

Monitoring error (data collection)

Raised by:

SQL Server, PostgreSQL.

Raised when:

One of the following conditions applies continuously for longer than a specified duration:

  • Problems with WMI.
  • File sharing issues.
  • SQL connectivity issues.

Configurable thresholds:

SQL Monitor cannot collect instance data for longer than x seconds/minutes/hours/days.

Default settings:

Raised as Medium.

Type:

Continuous

Automatically updated from Active to Ended when data can be collected from the instance again.

Possible causes:

  • One of the required services (WMI or RPC) has been stopped.
  • Remote file access permissions have changed or hidden administrative shares have been disabled.
  • SQL Server Service has been stopped.

Monitoring stopped (credentials)

Raised by:

SQL Server, PostgreSQL.

Raised when:

SQL Monitor is unable to collect monitoring data from the instance because the credentials supplied to connect to the instance are invalid or lack permissions.

Configurable thresholds:

Instance cannot be contacted for longer than x seconds/minutes/hours/days.

Default settings:

Raised as High.

Type:

Continuous

Automatically updated from Active to Ended once the correct credentials are entered and authentication is confirmed.

Possible causes:

  • Your user name or password has been changed.
  • Your permissions have changed and are no longer sufficient.

Instance unreachable

Raised by:

SQL Server, PostgreSQL.

Raised when:

The instance cannot be reached by SQL Monitor because it is not running, or because of some other error (other than permissions), for longer than the time you specify.

Configurable thresholds:

Instance unreachable for longer than x  seconds/minutes/hours/days.

Default settings:

Raised as High when the instance is unavailable for longer than 30 seconds.

Type:

Continuous

Automatically updated from Active to Ended when SQL Monitor can contact the instance.

Possible causes:

  • Host machine unreachable.
  • SQL Server: SQL Server service failed.
  • SQL Server: SQL Server service manually stopped.

    Check the list of component services on the relevant Windows or Linux machine. Check for Machine unreachable alerts.

  • PostgreSQL: postmaster process has failed.

SQL Server specific alerts

SQL Monitor raises the following types of alerts for problems on a SQL Server instance or database:

Availability group – database not healthy

Raised when:

Any of the following conditions apply, for longer than the time you specify:

  • The replica database is not synchronizing.
  • The replica database is not joined to the availability group, but the replica itself is joined.
  • The replica database is not connected to the availability group, but the replica itself is connected.
Configurable thresholds:Availability group database is not healthy for longer than x seconds/minutes/hours/days.
Default settings:

Raised as High when a monitored availability group database is not healthy for longer than 30 seconds.

Type:

Continuous

Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the database is healthy again.

Possible causes:
  • Suspended data movement.
  • Database inaccessible.
  • Delay caused by latency on the network.
  • Delay caused by load on the primary or secondary replica.

Availability group – failover

Raised when:The availability group fails over from a primary replica to a secondary replica.
Configurable thresholds:None.
Default settings:Raised as Medium.
Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended.

Possible causes:

Automatic failover occurs when the primary replica fails or goes offline for some reason. Planned manual failover and forced failover require initiation by a database administrator.

All three types of failover will cause this alert to be raised.

Availability group – listener offline

Raised when:The availability group listener is offline for longer than the time you specify.
Configurable thresholds:Listener is offline for longer than x  seconds/minutes/hours/days.
Default settings:Raised as High when a monitored availability group's listener is offline for longer than 30 seconds.
Type:

Continuous

Automatically updated from  Active to Ended  when the condition that caused the alert to be raised no longer applies – in this case, when the listener is online again.

Possible causes:
  • The IP address of the listener is offline.
  • There's a networking or cluster problem.
  • The listener's port is no longer available.

Availability group – not ready for automatic failover

Raised when:The failover mode of the primary replica is automatic, but no secondary replicas are configured for automatic failover, for longer than the time you specify.
Configurable thresholds:Availability group is not healthy for longer than x  seconds/minutes/hours/days.
Default settings:

Raised as Medium when a monitored availability group is not healthy for longer than 30 seconds.

Type:

Continuous

Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the availability group replica is ready for automatic failover again.

Possible causes:The availability group is incorrectly configured. In order for a particular primary replica to fail over automatically when it fails or goes offline, the primary replica and target secondary replica must both be set to automatic failover mode.

Availability group – query slowdown on primary due to synchronous replication

Raised when:Synchronous replication causes query slowdown on the primary replica, resulting in the transaction delay exceeding a given threshold.
Configurable thresholds:Transaction delay is greater than a specified value for longer than a specified time.
Default settings:

Raised as Medium when the transaction delay is more than 10000 ms for longer than 120 seconds.

Type:

Continuous

Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the transaction delay falls below the given threshold.

Possible causes:

In synchronous-commit mode, the primary replica doesn’t commit any transactions until it receives acknowledgement that all synchronous secondary replicas have finished hardening the commit in their log, resulting in transaction delay.

Several factors can affect transaction delay:

  • Flow control time.
  • Log bytes received/sec.
  • Log bytes flushed/sec (on the secondary replica).

Availability group – replica not healthy

Raised when:

Any of the following conditions apply, for longer than the time you specify:

  • The replica is not connected.
  • The replica is not joined.
  • The replica is not operational as a primary or secondary replica.
Configurable thresholds:Replica not healthy for longer than x seconds/minutes/hours/days.
Default settings:

Raised as High when a monitored availability replica is not healthy for longer than 30 seconds.

Type:

Continuous

Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the availability group replica is healthy again.

Possible causes:
  • Mismatched encryption type or algorithm.
  • Disconnected transport.
  • Another application is using the connection port.
  • The connection endpoint was deleted, or wasn't started.

Availability group – replication falling behind

Raised when:The sum of the log send queue and the redo queue for a particular secondary replica has been exceeding a given threshold for longer than the time you specify.
Configurable thresholds:Sum of log send queue and redo queue is greater than a specified value for longer than a specified time.
Default settings:

Raised as Medium when the sum of the log send queue and the redo queue is more than 100 MB for longer than 120 seconds.

Type:

Continuous

Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the sum of the log send queue and the redo queue falls below the given threshold.

Possible causes:
  • Slow network connection.
  • The primary or secondary replica is overloaded, meaning transaction log records can't be sent.
  • The secondary replica is experiencing resource problems, which increases the time it takes to redo the log entries received from the primary replica.

Cluster failover

Raised when:

The active node of a cluster changes to a different node.

Configurable thresholds:

None.

Default settings:

Raised as Medium.

Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended.

Possible causes:

The previously active node has failed or been manually switched to a different node.

Deadlock trace flag disabled

Raised when:

SQL Monitor is unable to turn on the deadlock trace flag on a SQL Server instance (1204 for SQL Server 2000, 1222 for SQL Server 2005 and later). This means deadlock alerts can't be raised.

Configurable thresholds:

None.

Default settings:

Raised as High.

Type:

Continuous

Automatically updated from Active to Ended when SQL Monitor can enable the trace flag.

Possible causes:

Insufficient privileges for the account used to connect to the SQL Server instance.

SQL Monitor requires sysadmin permissions on this account to turn on the deadlock trace flag.

Differential backup overdue

Raised when:

Either of the following conditions apply:

  • No entry for a differential or full backup of this database in the [msdb].[dbo].[backupset] system table.
  • The most recent entry for a differential or full backup of this database in the [msdb].[dbo].[backupset] system table is older than a specified time.

If a database belongs to an availability group, the alert is only raised on the copy of the database on the primary replica. The alert is raised when all entries for a differential backup for all copies of a particular database are older than the time threshold specified for the copy of the database that is on the primary replica.

Configurable thresholds:

Most recent backup is older than x seconds/minutes/hours/days

Default settings:

Alert is disabled by default.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when a differential backup is detected.

Possible causes:

  • No differential backup job scheduled.
  • Differential backup jobs not running or not completing. Check Job failing alerts on this SQL Server instance.
  • SQL Server Agent Service is not started – check for any SQL Server Agent Service status alerts.

Fragmented indexes

Raised when:

Both of the following conditions apply:

  • Fragmentation of one or more indexes in a database exceeds a percentage threshold.
  • The fragmented indexes contain more than a specified number of pages.

Configurable thresholds:

Percentage fragmentation level.

Indexes contain more than x pages.

Default settings:

Raised as Medium when index fragmentation is above 60% for indexes with more than 1000 pages.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when there are no fragmented indexes (based on the thresholds defined) for the database.

Possible causes:

Regular deleting or updating existing rows or values in a table.

Notes:

Checking for index fragmentation is a very resource-intensive activity.

For this reason, SQL Monitor only checks for fragmented indexes once a week: on Sunday at 01:00.

This means that the alert may remain Active for some time after you fix the issue, and will only be updated after the next scheduled weekly check.

Integrity check overdue

Raised when:

Either of the following conditions apply:

  • No entry for an integrity check found following DBCC DBINFO WITH TABLERESULTS.
  • The most recent entry for an integrity check is older than a specified time.

Configurable thresholds:

Most recent integrity check is older than x seconds/minutes/hours/days.

Default settings:

Alert is disabled by default.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when an integrity check of the database (DBCC CHECKDB) is detected.

Possible causes:

No integrity check has been carried out on a database, or the most recent integrity check was too long ago.

Job duration unusual

Raised when:

The job execution time is different from the baseline duration (the median of the last ten runs) by a specified percentage.

Configurable thresholds:

Percentage difference from baseline duration (either slower or quicker).

Ignore jobs with run times less than x seconds.

Default settings:

These defaults assume that once the baseline for a job is established, that job should not start running significantly more slowly or more quickly.

  • Raised as Low when duration is 50% different to baseline.
  • Escalated to Medium when duration is 60% different to baseline.
  • Escalated to High when duration is 70% different to baseline.
  • Ignore jobs that run for less than 2 seconds.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the job next completes with a duration that is within the allowed percentage of the baseline duration.

Possible causes:

  • High system load.
  • Sub-optimal SQL execution plan.
  • Job is waiting on an event or blocked.

Notes:

SQL Monitor calculates the baseline duration by using the job history to find the last ten run times.

SQL Monitor will not raise the Job duration unusual alert until the job history contains at least ten runs.

Job failing

Raised when:

Job does not complete successfully, and returns error code.

Optional: Raise alert on job step failure

Configure the job failing alert to be raised if the job completes successfully, but with one or more failing steps.

Configurable thresholds:

None.

Default settings:

Raised as Medium

Type:

Continuous

Automatically updated from Active to Ended when the job runs successfully again.

Possible causes:

Check the Job outcome message for a raised alert to help determine the problem.

Log backup overdue

Raised when:

Either of the following conditions apply:

  • No entry for a transaction log backup of this database in the [msdb].[dbo].[backupset] system table.
  • The most recent entry for a transaction log backup of this database in the [msdb].[dbo].[backupset] system table is older than a specified time.

If a database belongs to an availability group, the alert is only raised on the copy of the database on the primary replica. The alert is raised when all entries for a transaction log backup for all copies of a particular database are older than the time threshold specified for the copy of the database that is on the primary replica.

Configurable thresholds:

Most recent backup is older than x seconds/minutes/hours/days.

Default settings:

Raised as Medium when the last log backup is older than 7 days.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when a transaction log backup is detected.

Possible causes:

  • No log backup job scheduled.
  • Log backup jobs not running or not completing. Check Job failing alerts on this SQL Server instance.
  • SQL Server Agent Service is not started – check for any SQL Server Agent Service status alerts.

Page Life Expectancy

Raised when:

Raised when the Page Life Expectancy drops below the user configured threshold for a specified duration.

Configurable thresholds:

PLE is lower than the specified value for longer than a specified time.

Default settings:

Alert is disabled by default.

Type:

Continuous

Automatically updated from Active to Ended when the condition that caused the alert to be raised no longer applies – in this case, when the PLE rises above the highest defined threshold. 

Possible causes:

A drop in PLE indicates that the buffer pool has been flushed, which isn’t necessarily because of an issue. Big or persistent PLE drops can be correlated with queries that do a lot of physical reads or take big memory grants, or with scheduled SQL Server Agent jobs that perform database maintenance.

Page verification

Raised when:

PAGE_VERIFY is set to NONE (SQL Server 2005 or SQL Server 2008) or TORN_PAGE_DETECTION is set to FALSE (SQL Server 2000) for a database.

Configurable thresholds:

None.

Default settings:

Alert is disabled by default.

Type:

Continuous

Automatically updated from Active to Ended when SQL Monitor detects that Page verification has been turned on.

Possible causes:

New databases inherit this setting from the Model database. Check that Page Verify is turned on for the Model databases, if required.

SQL Server Agent Service status

Raised when:

SQL Server Agent Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped.
  • Stopped or paused.
  • Started.
  • Started or paused.

Default settings:

Raised as Medium when service is Stopped.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed.
  • Service manually stopped or started.

    Check the list of component services on the relevant Windows or Linux machine.

SQL Server Analysis Service status

Raised when:

SQL Server Analysis Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped.
  • Stopped or paused.
  • Started.
  • Started or paused.

Default settings:

Raised as Medium when service is Stopped.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed.
  • Service manually stopped or started.

    Check the list of component services on the relevant Windows or Linux machine.

SQL Server error log entry

Raised when:

An error message has been written to the SQL Server error log with a severity level above a specified value.

Configurable thresholds:

Error severity equal to or higher than x.

Default settings:

Raised as Medium for errors with severity level equal to or greater than 20.

Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended.

Possible causes:

Various.

Check the SQL Server error log entry area of the alert to see the error message text.

SQL Server log entry

Raised when:

A message has been written to the SQL Server error log with that matches a specified regex pattern.

Configurable thresholds:

Message matching the specified regex pattern on the relevant threshold.

Default settings:

Alert is disabled by default.

Type:

Event

Event alerts are raised for incidents that occur at a specific point in time; they do not change level, or update their status to Ended.

Possible causes:

Various.

Check the SQL Server log entry area of the alert to see the error message text.

SQL Server Full Text Search Service status

Raised when:

SQL Server Full Text Search Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped.
  • Stopped or paused.
  • Started.
  • Started or paused.

Default settings:

Raised as Medium when service is Stopped.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed.
  • Service manually stopped or started.

    Check the list of component services on the relevant Windows or Linux machine.

SQL Server Reporting Service status

Raised when:

SQL Server Reporting Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped.
  • Stopped or paused.
  • Started.
  • Started or paused.

Default settings:

Raised as Medium when service is Stopped.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed.
  • Service manually stopped or started.

    Check the list of component services on the relevant Windows or Linux machine.

Version store usage

Raised when:

The percentage of used space in version store is above the threshold for longer than a specified duration.

Configurable thresholds:

Percentage of used space in version store being used for longer than x  seconds.

Default settings:

Alert is disabled by default.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when the version store usage goes below the threshold.

Possible causes:

  • A high number of uncommited transactions
  • Uncommited transactions for too long
  • tempdb having a large size

Host machine alerts

SQL Monitor raises the following types of alerts for problems on a host machine (Windows or Linux server):

Clock skew

Raised when:

The difference between the Base Monitor clock time and the monitored server clock time is greater than 15 seconds.

(The Base Monitor is the server which is running the monitoring service.)

Configurable thresholds:

None.

Note: This alert is always raised as Medium.

Default settings:

Time difference greater than 15 seconds. 

Type:

Continuous

Automatically updated from Active to Ended when the times are synchronized (within 15 seconds of each other).

Possible causes:

Server times across your network have not been fully synchronized.


Custom metric collection error

Raised when:

SQL Monitor has problems collecting custom metric data.

Configurable thresholds:

None.

Default settings:

Raised as Medium.

Type:

Continuous

Automatically updated from Active to Ended when data can be collected from the database again.

Possible causes:

The most likely cause is a problem with the custom metric's T-SQL query.


Disk space

Raised when:

One of the following conditions applies, depending on how you configure the alert:

  • logical disk space used is above a percentage threshold.
  • logical disk space available is less than a fixed value.

Configurable thresholds:

You can configure this alert in either one of two ways:

  • Used disk space percentage.
  • Disk space available (in MB or GB).

    Low disk space has lasted longer than x seconds.

Default settings:

Disk space available:

  • Raised as Medium when less than 1GB available for longer than 60 seconds.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when disk space is above the lowest defined threshold for at least the specified duration.

Possible causes:

  • Database and log files may be growing too large without frequent backups.
  • Other applications may be using the disk drive for file storage.

Note:

SQL Monitor only collects disk space data at 15-second intervals:

  • the minimum value for the duration is 15 seconds.
  • when you configure the alert, change the duration value by 15-second increments.


Machine unreachable

Raised when:

The Windows or Linux server (host machine) does not respond to a WMI request from SQL Monitor for longer than the time you specify.

Configurable thresholds:

Machine is unreachable for longer than x  seconds/minutes/hours/days.

Default settings:

Raised as High  when the host Windows or Linux machine is unavailable for longer than 30 seconds.

Type:

Continuous

Automatically updated from Active to Ended when SQL Monitor can contact the host machine.

Possible causes:

  • Host machine turned off or has suffered a problem.
  • WMI request blocked by machine or a firewall.

    Check the log for the machine on the Monitored servers page.


Monitoring error (host machine data collection)

Raised when:

One of the following conditions applies continuously for longer than the time you specify:

  • Problems with WMI.
  • File sharing issues.

Configurable thresholds:

SQL Monitor cannot collect data from the host machine for longer than x  seconds/minutes/hours/days.

Default settings:

Raised as Medium when SQL Monitor cannot collect data from the host machine for longer than 30 seconds.

Type:

Continuous

Automatically updated from Active to Ended when data can be collected from the host machine again.

Possible causes:

  • One of the required services (WMI or RPC) has been stopped.
  • Remote file access permissions have changed or hidden administrative shares have been disabled.


Monitoring stopped (host machine credentials)

Raised when:

SQL Monitor is unable to collect monitoring data from the host machine for longer than the time you specify, because the credentials supplied to connect to the machine are invalid or lack permissions.

Configurable thresholds:

Host machine cannot be contacted for longer than x seconds/minutes/hours/days.

Default settings:

Raised as High when the host machine cannot be contacted due to an authentication failure for longer than 30 seconds.

Type:

Continuous

Automatically updated from Active to Ended once the correct credentials are entered and authentication is confirmed.

Possible causes:

  • Your user name or password has been changed.
  • Your permissions have changed and are no longer sufficient.


Physical memory

Raised when:

One of the following conditions applies, depending on how you configure the alert:

  • physical memory used is above a percentage threshold.
  • physical memory available is less than a fixed value.

Configurable thresholds:

You can configure this alert in either one of two ways:

  • Used physical memory percentage.
  • Physical memory available (in MB or GB).

    Low physical memory has lasted longer than x seconds.

Default settings:

Alert is disabled by default.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when physical memory is above the lowest defined threshold for at least the specified duration.

Possible causes:

  • SQL Server has been configured with insufficient memory.
  • Page file space running low.
  • Other processes consuming physical memory.
  • Not enough RAM on server.

Note:

SQL Monitor only collects physical memory data at 15-second intervals:

  • the minimum value for the duration is 15 seconds.
  • when you configure the alert, change the duration value by 15-second increments.


Processor under-utilization

Raised when:

Total processor utilization, averaged across all CPUs, is below a percentage threshold for longer than a specified duration.

Configurable thresholds:

Processor utilization is less than a specified percentage.

Under-utilization has lasted longer than x seconds.

Default settings:

Alert is disabled by default.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when processor utilization is above the lowest defined threshold for at least the specified duration.

Possible causes:

Processor utilization is not as high as expected under normal operations: may indicate that SQL Server is not running normally or processing data – freeing up CPU.

Note:

SQL Monitor only collects processor utilization data at 15-second intervals:

  • the minimum value for the duration is 15 seconds.
  • when you configure the alert, change the duration value by 15-second increments.


Processor utilization

Raised when:

Total processor utilization, averaged across all CPUs, is above a percentage threshold for longer than a specified duration.

Configurable thresholds:

Processor utilization above a specified percentage.

Utilization above this percentage has lasted longer than x seconds.

Default settings:

Raised as Medium when processor utilization is more than 90% for longer than 600 seconds.

Type:

Continuous

Can have multiple thresholds applied and is automatically updated from Active to Ended when processor utilization is above the lowest defined threshold for at least the specified duration.

Possible causes:

  • Other processes running on the server – check the System processes area of the raised alert.
  • CPU-intensive SQL queries – if Profiler trace is turned on for the SQL Server, check the SQL statements in the SQL processes/Profiler trace area of the raised alert.

Note:

SQL Monitor only collects processor utilization data at 15-second intervals:

  • the minimum value for the duration is 15 seconds.
  • when you configure the alert, change the duration value by 15-second increments.



SQL Server Browser Service status

Raised when:

SQL Server Browser Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped.
  • Stopped or paused.
  • Started.
  • Started or paused.

Default settings:

Raised as Medium when service is Stopped.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed.
  • Service manually stopped or started.

    Check the list of component services on the relevant Windows or Linux machine.



SQL Server Integration Service status

Raised when:

SQL Server Integration Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped.
  • Stopped or paused.
  • Started.
  • Started or paused.

Default settings:

Raised as Medium when service is Stopped.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed.
  • Service manually stopped or started.

    Check the list of component services on the relevant Windows or Linux machine.


SQL Server VSS Service status

Raised when:

SQL Server VSS Service status matches the status specified in the alert configuration.

Configurable thresholds:

Service status is one of the following:

  • Stopped.
  • Stopped or paused.
  • Started.
  • Started or paused.

Default settings:

Raised as Medium when service is Stopped.

Type:

Continuous

Automatically updated from Active to Ended when the service status changes to a status other than that specified.

Possible causes:

  • Service failed.
  • Service manually stopped or started.

    Check the list of component services on the relevant Windows or Linux machine.


Do you have any feedback on this documentation?

Let us know at sqlmonitorfeedback@red-gate.com


Didn't find what you were looking for?