SQL Monitor 13

SQL Server metrics

User connections

The total number of user connections to a SQL Server instance at a given time.

Equivalent PerfMon counter

SQLServer: General Statistics: User Connections

Explanation

This value provides a general guideline of how busy the SQL Server is. Every connection to a SQL Server instance requires on average 28 KB of memory overhead, even if the user is not currently using the connection. You can adjust the settings for your SQL Server instance so that the maximum number of user connections permitted is determined automatically by identifying the number of users currently logged on.

Generally speaking, there is a correlation between the number of user connections and Batch requests/sec.

Note: The user connections value is not the same as the number of users. A single user may have multiple connections open, or multiple users may share a single connection.

Guideline values

It is useful to monitor this value over time to indicate increases or decreases in usage of your SQL Server. The number of worker threads is calculated dynamically based on the type of installation and number of processors available to SQL Server:

For 32-bit servers, the base worker thread count is 256 and 8 additional worker threads are added for each scheduler above the initial 4 on the server ((NumberOfSchedulers -4) *8) +256.
For 64-bit servers, the base worker thread count is 512 and 16 additional worker threads are added for each scheduler above the initial 4 on the server ((NumberOfSchedulers -4) *16) +512.

For more information on worker thread counts, see: Configure the max worker threads Server Configuration Option.

Note: The number of connections on a server can easily be three or four times the number of worker threads if connection pooling is being used. Connections are pooled per application domain, per connection string, per server, and not every connection will be executing at exactly the same time.

SQL Server: total memory

The total amount of dynamic memory the server is currently using.

Equivalent PerfMon counter	SQLServer:Memory Manager - Total Server Memory (KB)
Explanation	Amount of physical memory the mssqlserver service is currently using; includes the total of the buffers committed to the SQL Server BPool and "OS in Use" type buffers.
Guideline values	If SQL Server: total memory is relatively high compared to the amount of physical memory in the computer, it may indicate that more physical memory is required.
Check also	SQL Server: target memory: Displays the maximum amount of dynamic memory (the buffer pool) that SQL Server is prepared to use. This number cannot exceed the maximum server memory setting. When SQL Server first starts, this value corresponds to the buffers reserved at SQL Server startup. The default memory management behavior of the Microsoft SQL Server Database Engine is to acquire as much memory as it needs without creating a memory shortage on the system. If SQL Server needs more physical RAM after startup, and if there are Mbytes available, then SQL Server will grab it and SQL Server: target memory will increase. If the OS requests RAM back from SQL Server, then SQL Server: target memory will usually decrease, which in some cases can cause SQL Server memory pressure, hurting SQL Server performance. Buffer free pages: The number of free spaces available in the buffer cache. Generally speaking, if this number drops below 640 pages, this could indicate SQL Server memory pressure. Buffer page life expectancy: Measures the average number of seconds a page will stay in the buffer cache without being accessed. Generally speaking, if this value drops below 300 seconds, this may indicate SQL Server is under memory pressure.
Possible solutions	Remove unnecessary applications and services from your server, and add more physical RAM.

SQL Server: target memory

Maximum amount of dynamic memory (the buffer pool) that SQL Server is prepared to use.

Equivalent PerfMon counter	SQLServer:Memory Manager - Target Server Memory (KB)
Explanation	Displays the maximum amount of dynamic memory (the buffer pool) that SQL Server is prepared to use. This value cannot exceed the max server memory setting. When SQL Server first starts, this value corresponds to the buffers reserved at SQL Server startup. The default memory management behavior of the Microsoft SQL Server Database Engine is to acquire as much memory as it needs without creating a memory shortage on the system. If SQL Server needs more physical RAM after startup, and if Mbytes are available, then SQL Server will grab it and SQL Server: target memory will increase. If the OS requests RAM back from SQL Server, then SQL Server: target memory will usually decrease, which in some cases can cause SQL Server memory pressure, hurting SQL Server's performance.
Guideline values	If this value is high (relative to the total physical memory available on your machine), then you should consider adding extra physical memory, to improve the performance of SQL Server.
Check also	SQL Server: total memory: Displays the total amount of dynamic memory (the buffer pool) the server is currently using. SQL Server: total memory will increase as SQL Server needs it, assuming it is available. This value will always be smaller or equal to SQL Server: target memory. If SQL Server: total memory is relatively high compared to the amount of physical memory in the computer, it may indicate that more physical memory is required. Buffer free pages: The number of free spaces available in the buffer cache. If this number drops below 640 pages, this could indicate SQL Server is under memory pressure. Buffer page life expectancy: Measures the average number of seconds a page will stay in the buffer cache without being accessed. If this value drops below 300 seconds, this may indicate SQL Server is under memory pressure.
Possible solutions	Remove unnecessary applications and services from your server, and add more physical RAM.

SQL Server: free memory

The total amount of dynamic memory that the server is not using.

Note: This metric was introduced in SQL Server 2012 to replace Buffer free pages (which was used in SQL Server 2008 and earlier). For SQL Server 2008 and earlier, we've now converted free pages into memory (1 page is equal to 8 KB).

Equivalent PerfMon counter	SQLServer: Memory Manager - Free memory
Explanation	The Memory Manager monitors the overall memory usage of SQL Server. This metric can help you identify whether there are bottlenecks resulting from a lack of memory (which means that SQL Server has to retrieve data from disk). It can also help you to identify whether you can improve query performance by increasing the amount of memory.
Guideline values	A free memory value greater than 5 MB should ensure that the buffer cache can handle sufficient new memory allocation requests. A value less than 5 MB may mean that requests are stalled and this can affect performance. You may consider adding memory or making more memory available to the data cache.

Batch requests/sec

Number of T-SQL batch requests received by SQL Server per second, averaged since last collection time.

Equivalent PerfMon counter	SQLServer:SQL Statistics - Batch Requests/sec
Explanation	This gives a general indicator of how busy SQL Server is. It measures all of the batches you send to the SQL Server. The number of batch requests SQL Server can handle is dependent on the capability of your hardware. It may be useful to track this value over time, to gauge the scalability of your system, and to identify when you have peaks and troughs in user requests. Batch requests/sec only tracks the number of batches, not the statements in the batch. Some batches may use very little resources, while other batches use huge amounts of resources. Each batch is different, and this metric only provides an absolute count of the batches executed by SQL Server.
Guideline values	In general, over 1000 batch requests per second indicates a very busy SQL Server, and could highlight the potential for a CPU bottleneck. More powerful hardware can, of course, handle higher numbers of requests.
Check also	Compilations/batch and Compilations/sec. These values should be as low as possible, as high values may indicate a lot of ad hoc queries being run. You may need to optimize T-SQL.

Compilations/sec

SQL compilations per second; each data point shows the average value since the previous data point.

Equivalent PerfMon counter	SQLServer:SQL Statistics - SQL Compilations/sec
Explanation	The number of times that Transact-SQL compilations occur, per second (including recompiles). The lower the value, the better. Each compilation uses CPU and other resources, so SQL Server will attempt to reuse cached execution plans wherever possible. Not having to recompile the stored procedure reduces the overhead on the server and speeds up overall performance.
Guideline values	In general, Compilations/sec should be less than 10% of the Batch requests/sec. High values often indicate excessive ad hoc querying and should be as low as possible.
Check also	Batch requests/sec.
Possible solutions	Rewrite ad hoc queries as stored procedures. Reduce usage of temp tables, or combine statements to eliminate them in stored procedures. Check "SET" statements – changes in session settings such as ANSI_PADDING or ANSI_NULLS can cause a query to be recompiled. If SQL Compilations/sec are excessive in SQL Server 2005 or later, consider using "Forced parameterization" on the database. This tells SQL Server to force the parameterization of virtually all SELECT, INSERT, UPDATE and DELETE statements, requiring fewer compilations of statements due to reuse of more query plans.

Compilations/batch

SQL compilations per batch requests received by SQL Server.

Equivalent PerfMon counter	(SQLServer:SQL Statistics - SQL Compilations/sec) / (SQLServer:SQL Statistics - Batch Requests/sec)
Explanation	This should provide a general indication of how quickly and efficiently SQL Server query optimizer is processing user queries.
Guideline values	As a general rule, Compilations/sec should be less than 10% of the Batch requests/sec. High values often indicate excessive ad hoc querying and should be as low as possible.
Check also	Compilations/sec and Batch requests/sec.

Recompilations/sec

Number of times per second that Transact-SQL objects attempted had to be recompiled.

Equivalent PerfMon counter	SQLServer:SQL Statistics - Re-Compilations/sec
Explanation	Usually, once an object is compiled, it does not have to be recompiled. When a stored procedure or query is recompiled, a compile lock is placed on the objects referenced by the procedure, and database blocking may occur. Excessive recompilations can cause serious performance problems or may compile locks that are incompatible with any known locking type.
Guideline values	Recompilations/sec should be as small as possible. Any value greater than 10% of Compilations/sec can indicate a problem.
Check also	Compilations/sec, Batch requests/sec, and Compilations/batch requests to determine how quickly and efficiently SQL Server and the query optimizer are processing user queries. Check also SP:Recompile and SQL:StmtRecompile event classes in a Profiler trace to identify which stored procedures and SQL statements need recompiling with every run. The EventSubClass data column tells you what caused the recompile.

Buffer cache hit ratio

The percentage of pages that were found in the buffer pool without having to read from disk. (Deprecated, removed from SQL Monitor in 11.0.21)

Equivalent PerfMon counter	SQLServer: Buffer Manager - Buffer cache hit ratio
Explanation	Indicates how frequently a query gets information from cache instead of accessing the hard disk. The value is calculated by dividing the total number of cache hits by the total number of cache lookups. This figure is an average value, calculated since the SQL Server service was last restarted, not a snapshot of what is currently happening on your server now. Because of this, be careful how you interpret this value.
Guideline values	If your server is running online transaction processing (OLTP) applications, a value of 99% or higher is ideal, but anything above 90% is generally considered satisfactory. A value of 90% or lower may indicate increased I/O access and slower performance. Adding more physical RAM may help alleviate this problem. If your server is running online analytical processing (OLAP) applications, it is not uncharacteristic for this value to fall below 90%, which is not normally a problem, unless you see other signs of memory pressure. Note: Be aware that the value of this metric immediately after a service restart will be uncharacteristically low until enough time has passed for the data cache to fill.
Check also	Buffer page life expectancy: Measures the average number of seconds a page will stay in the buffer cache without being accessed. Generally, if this value drops below 300 seconds, this may indicate SQL Server is under memory pressure.

Buffer page life expectancy

Indicates the average number of seconds a page will stay in the buffer pool without being referenced.

Equivalent PerfMon counter

SQLServer: Buffer Manager - Page life expectancy

Explanation

Data is cached in the buffer pool so that it can be retrieved much faster than having to retrieve it from disk. However, once data pages have been stored in the buffer pool, they may or may not be reused. If a page is not referenced after a given amount of time, that page will be removed from the buffer pool so other data can use the space. A high buffer page life expectancy indicates that buffer pages are being reused often, which in turn indicates that the SQL Server is not under any memory pressure.

Guideline values

This value depends on the amount of RAM installed on the servers. It is best practice to record a baseline measurement against which you can compare the monitored values over time. Do this by running the formula (DataCacheSizeInGB/4GB*300) as recommended in Finding what queries in the plan cache use a specific index. For details on how to find the data cache size, see Performance issues from wasted buffer pool memory.

If data pages consistently remain in the buffer for less than your baseline value, this may indicate that there is insufficient memory on the system or not enough memory is configured for SQL Server use. You may consider improving performance by adding RAM. The problem may also be caused by missing indexes, poor index design or badly written application code, such as too many stored procedures being recompiled, or queries reading large tables in the buffer pool when only a small number of rows need changing. Addressing application-specific problems may improve performance.

Full scans/sec

Measures the number of base table or full index scans occuring per second.

Equivalent PerfMon counter	SQLServer:Access Methods: Full Scans/sec
Explanation	This value monitors how many full scans are performed on tables or indexes. While index scans are a normal part of how SQL Server works, a lot of full scans may indicate missing indexes, accessing many small tables, or queries that return large quantities of rows.
Guideline values	It is best practice to record a baseline measurement against which you can compare your monitored values. If the monitored value is significantly higher when compared with the baseline, and CPU is also high, it may suggest performance bottleneck. The ratio of Index searches/sec to Full scans/sec should generally be no more than 1000:1.
Check also	Machine: processor time to establish whether CPU is also high. Also look at Batch requests/sec against your baseline measurement for details of how well the database is coping with queries. You can then check the Lock waits/sec and Lock timeouts/sec to determine whether locking issues are affecting performance.

Page splits/sec

The number of page splits, and new page allocations, in a SQL Server table per second.

Equivalent PerfMon counter	SQLServer:Access Methods: Page Splits/sec
Explanation	As new data is inserted or added to an existing data page, and the page does not have enough room for all the new data to fit, SQL Server splits the page, leaving half of the data on the old page and putting the other half on a new page. This is a normal SQL Server event, but excessive page splitting can result in index fragmentation and a variety of related performance problems.
Guideline values	It is best practice to record a baseline against which you can compare your monitored values. In general terms, this value should not be greater than 20% of the number of Batch requests/sec. A low value is preferable, but what constitutes low depends on a number of variable factors including the number of users, the level of user activity and your I/O subsystem performance capabilities.
Check also	Disk avg. read time and Disk avg. write time. There is often a correlation between excessive I/O activity and excessive page splitting.
Possible solutions	Resolving page splitting problems is often difficult as the original database design may be contributing to the problem. Besides reconsidering the database design, also consider reevaluating your indexing strategy, and perhaps increase the fillfactor of the indexes being affected by excessive page splitting. If excessive page splitting cannot be effectively eliminated, consider increasing the frequency in which you rebuild or reorganize your indexes in order to remove the fragmentation created by page splitting.

Page splits/batch request

The number of page splits per second in a SQL Server table divided by the number of T-SQL command batch requests received per second.

Equivalent PerfMon counter	(SQLServer:Access Methods: Page Splits/sec) / (SQLServer:SQL Statistics - Batch Requests/sec)
Explanation	While page splits are a normal SQL Server function, excessive page splitting can negatively affect SQL Server performance.
Guideline values	On average, the number of page splits per batch request should be below 20% of the number of batch requests per second, but ideally as close to 0 as possible.
Check also	The fill factor of the tables in your indexes. The lower the fill factor, the more space, which helps reduce potential page splits. For more information about fill factor, see the MSDN documentation.

Latch wait time

The average wait time, in milliseconds, that latch requests have to wait before being executed.

Equivalent PerfMon counter	SQLServer:Latches - Average Latch Wait Time (ms)
Explanation	Think of a latch as a lightweight lock used by SQL Server to manage many internal operations. They prevent one SQL Server internal operation from conflicting with another internal SQL Server operation. This average is calculated by dividing latch wait milliseconds by the number of latch waits. Monitoring latch wait time helps to generically identify user activity and resource utilization that may be the cause of performance bottlenecks. The current average should be compared against a baseline average. Note that latches that do not have to wait for requests to be granted are not included in this value.
Guideline values	In general, there is a correlation between this metric and latch waits/sec. Use PerfMon to check the latch waits/sec value. If both values increase above their baseline averages, then SQL Server is experiencing resource contention that needs further investigation.
Check also	The following features included in the dynamic management view (DMV) to help identify which activity is causing latch wait problems: sys.dm_os_latch_stats (latch wait information by class) sys.dm_os_wait_stats (details about the waits encountered by threads in execution) sys.dm_db_index_operational_stats (details of I/O, latching, locking and access method activity for table partitions or database indexes). Note: These DMVs are not supported in SQL 2000. If you are using SQL 2000, use the DBCC SQLPERF('WAITSTATS') statement. The relative wait numbers and times for every latch class must be reviewed to understand SQL Server instance performance issues.

Lock timeouts/sec

The number of locks per second that timed out.

Equivalent PerfMon counter	SQLServer:Locks - Lock Timeouts/sec
Explanation	Locks are held on SQL Server resources to stop them from being used simultaneously by another transaction. For example, if a transaction holds an exclusive lock on a row, no other transaction can make changes to that row until the lock is removed. Fewer locks allow more concurrent transactions to take place and this may improve performance. The lock timeout limit in SQL Server depends on the value set using the SET LOCK_TIMEOUT command. If no value is set, SQL Server will wait indefinitely for the lock to clear.
Guideline values	Any value greater than 0 suggests that users may be experiencing problems with queries that are not completing.
Check also	Lock waits/sec. This can identify the number of locks that did not time out but had to wait for resource.

Lock waits/sec

The number of locks per second that could not be satisfied immediately and had to wait for resources.

Equivalent PerfMon counter	SQLServer:Locks - Lock Waits/sec
Explanation	Locks are held on SQL Server resources to stop them from being used simultaneously by another transaction. For example, if a transaction holds an exclusive lock on a row, no other transaction can make changes to that row until the lock is removed. Fewer locks allow more concurrent transactions to take place and this may improve performance.
Guideline values	Any value greater than 0 suggests that some level of blocking is occurring. It is best practice to compare the current value against a baseline to determine what is normal for your SQL Server. An increasing lock wait time indicates potential resource contention.
Check also	Avg. disk queue length. Excessive disk queueing may be caused by badly configured or inadequately sized disk subsystems. You may want to review indexing and investigate the numbers of concurrent connections.

Avg. lock wait time

The average wait time (in milliseconds) for each lock that could not be satisfied immediately and had to wait for resources.

Equivalent PerfMon counter	SQLServer:Locks - Average Wait Time (ms)
Explanation	Locks are held on SQL Server resources to stop them from being used simultaneously by another transaction. For example, if a transaction holds an exclusive lock on a row, no other transaction can make changes to that row until the lock is removed. Fewer locks allow more concurrent transactions to take place and this may improve performance.
Guideline values	Average wait times of 500 ms or more may indicate excessive blocking and should be investigated. In general, Avg. lock wait time correlates with Lock waits/sec, and if you see both of them increasing, you need to investigate what is causing the blocking issues.
Check also	Lock waits/sec to identify peaks over time, and Lock timeouts/sec to see how many locks reached the threshold set using the SET LOCK_TIMEOUT command.

SQL Server database metrics

Transactions/sec

Number of transactions started for the database per second. Each data point shows the average number of transactions per second since the previous data point.

Equivalent PerfMon counter

SQLServer:Databases - Transactions/sec

Explanation

A transaction is one or more database operations combined into a single operation: either user-defined transactions surrounded by BEGIN TRAN and COMMIT TRAN or individual data modification statements such as INSERT or UPDATE. Transaction rate is affected by general system performance and resources such as I/O, number of users, cache size, and complexity of requests.This counter records only explicit transactions or transactions that change data.

Note: Transactions per second only measures activity inside a transaction, not all activity, The Batch requests/sec metric is a more complete picture of all SQL Server activity.

Guideline values

A high number of transactions per second is not necessarily an indication of a problem, only that there is a lot of activity on your SQL Server. You should establish a baseline value and track when transactions are unusually high or low.

Active transactions

Number of active transactions for the database. Each data point shows the number of active transactions at the point the data was collected.

Equivalent PerfMon counter

SQLServer:Databases - Active Transactions

Explanation

Active transactions are UPDATE transactions on the database that have not yet been committed or rolled back.

Note: This metric measures activity inside a transaction, not all activity. The Batch requests/sec metric is a more complete picture of all SQL Server activity.

Guideline values

Generally, the number of active transactions should be considerably lower than the number of Transactions/sec (as active transactions show only those transactions that have not yet completed). If the number of transactions currently active is higher than the Transactions/sec metric, then this may indicate you have long-running transactions. Long-running transactions can increase lock times and cause blocking.

Total size

The total size (in GB) of all data files (.mdf and .ndf) and log files (.ldf) for this database.

Equivalent PerfMon counter	SQLServer:Databases - Data File(s) Size (KB) + SQLServer:Databases - Log File(s) Size (KB)
Explanation	This value is the sum of the total data files and total log files sizes. It is useful for determining the amount of space required by the database and identifying trends.
Guideline values	For best practice, MDF, NDF, and LDF files should be proactively managed in order to prevent autogrowth from kicking in and growing your database files. Note: If the values for this metric drop to zero intermittently, the most likely cause is a problem affecting sys.master_files from which SQL Monitor collects the data.
Check also	Log size for details of total .ldf files, and Data size for details of total .mdf and .ndf files.

Data size

Total size of all data files (.mdf and .ndf files) for the specified database.

Equivalent PerfMon counter	SQLServer:Databases - Data File(s) Size (KB)
Explanation	This value is the total size of the database, excluding the transaction log. Information about individual primary data files (.mdf) and secondary data files (.ndf) such as their current size, location, and autogrowth settings, is shown under the Files section of the Database overview page in SQL Server Management Studio.
Guideline values	Look out for unusual large increases in the size of your data files. Ideally your database should be sized in a way that minimizes autogrowth, as each size increase is expensive in terms of I/O and will also physically fragment your data and log files. Growing a file on demand is an expensive process and will impact performance. Autogrowth should only be used as a safety valve to allow a database to grow should you accidently run out of space. It should not be used to manage your MDF file growth. In general, data files should be pre-sized when they are first created to meet future expected growth, help avoid file fragmentation and ensure better database performance. Note: If the values for this metric drop to zero intermittently, the most likely cause is a problem affecting sys.master_files from which SQL Monitor collects the data.
Check also	Full Recovery Mode is used for all production databases. When using Full Recovery Mode, be sure you take regular full and transaction log backups. The files are set to be the amount of space you need for normal activity. Monitor the data files and manually add space as data grows. You should have 6–12 months of space for data growth in your data files. If you don't have space for the database to grow at its current rate, move the database to a larger disk drive or upgrade the disk itself. Caution: Shrinking files to reduce space causes fragmentation and is not recommended.

Log size

Total size of all log files (.ldf) for this database.

Equivalent PerfMon counter	SQLServer:Databases - Log File(s) Size (KB)
Explanation	Each transaction performed in the database is recorded in the transaction log. This allows recovery in the event of data loss since the last backup. Information about the log file (.ldf) such as its location, autogrowth setting and current size, is shown under the Files section of the Database overview page in SQL Server Management Studio.
Guideline values	Transaction logs are serial records of database modifications. They're used during recovery operations to commit completed transactions and to roll back uncompleted transactions. The size of the log file is determined by the Recovery Model set for the database. By default when creating a new database it will use the Full recovery mode (inherited from the Model database), where transactions in the log file are only ever removed when a transaction log backup is performed. When you back up a transaction log, the backup stores the changes that have occurred since the last transaction log backup and then truncates the log, which clears out transactions that have been committed or aborted. If the transaction log is not backed up regularly, it will grow indefinitely in size and could fill your entire disk. Note: If the values for this metric drop to zero intermittently, the most likely cause is a problem affecting sys.master_files from which SQL Monitor collects the data.
Check also	The transaction log file should be pre-allocated to the size you think it needs to be when your system is in production. This size will depend on the number of transactions you expect, and how often you back up the transaction log. Set correct autogrowth properties: 10% autogrowth for data and log files (the default setting) may be sufficient for low utilization databases, but 500 MB autogrowth rate may be more suitable for a busy database, allowing for growth over time without the heavy I/O impact caused by regular autogrowth operations. Even in Simple recovery model, if you are writing to the transaction log fast enough, the checkpoint process cannot keep up, and the log file will trigger autogrowth. If you are performing a one-off bulk insert operation, you may want to switch the recovery model to bulk logged for the duration of the insert. If you do not need point-in-time recovery for this database, then it may be appropriate to switch to Simple recovery model. If a database is configured with the Full or Bulk Logged recovery model, you should back up the transaction log regularly, so it can be truncated to free up inactive log space. Truncating a log file removes inactive virtual log files, but does not reduce the file size.

Log space used

Percentage of total transaction log file size currently containing log records.

Equivalent PerfMon counter	SQLServer:Databases - Log File(s) Used Size (KB)
Explanation	The transaction log file stores the details of all the modifications that you perform on your SQL Server database. The amount of space it takes up depends on the recovery model used, database activity and how often transaction logs are backed up, among other factors.
Guideline values	It is normal to see the log space figure vary considerably in size over time as transactions are recorded, and removed when log backups are performed.

Log bytes flushed/sec

Shown on the Availability group overview as Log growth.

Average size of the log flush per second. Each data point shows the average log bytes flushed per second for this database since the previous data point.

Equivalent PerfMon counter	SQLServer:Databases - Log Bytes Flushed/sec
Explanation	This metric helps to assess utilization and identify trends of the transaction log. A log flush occurs when data is written from the log cache to the physical transaction log file on disk, every time a transaction is committed. Within availability groups, both the primary and the secondary replicas have a log bytes flushed rate. On the primary replica, this shows how quickly log records are being added to the log send queue. On the secondary, it shows how quickly log bytes are being added to the redo queue. The Availability group overview page only shows the rate of log bytes flushed on the primary replica.
Guideline values	The size of a log flush varies depending on the nature of the transaction. For example, a single INSERT into 1,000,000 rows will result in a single log flush, but a high number of bytes. Within availability groups, if the rate of log bytes flushed on the primary replica is higher than the rate at which log bytes are sent, the size of the log send queue will increase. To estimate the potential data loss on failover (in asynchronous-commit mode), divide the log send queue by the log bytes flushed. If the rate of log bytes flushed on the secondary replica is higher than the redo rate, the size of the redo queue will increase.
Check also	Log flushes/sec: The number of log pages flushed to the transaction log per second. Log send queue: The amount of log records in the primary database log file waiting to be sent to the secondary replica. Redo queue: The amount of log records that needs to be redone (written to the secondary database) for the synchronous-commit primary and secondary replicas to be synchronized.

Log flushes/sec

Number of log flushes per second for this database, averaged since last collection time.

Equivalent PerfMon counter	SQLServer:Databases - Log Flushes/sec
Explanation	A log flush occurs when a transaction is committed and data is written from the log cache to the physical transaction log file. The log cache is a temporary location in memory where SQL Server stores data to be written to the log file, and is used to roll back a transaction before it is committed if required. Once a transaction is completed (and can no longer be rolled back), the log cache is immediately flushed to the physical log file on disk.
Guideline values	Log flushes per second should generally correlate with the number of transactions per second. If log flushes per second seems to be significantly higher than the expected number of transactions, check your use of explicit transactions in queries. Explicitly defining the start and end of transactions (rather than implicit transactions, where SQL Server needs to flush the log for each data statement) should reduce the number of log flushes, and reduce impact on I/O.
Check also	Log bytes flushed/sec

Log flush waits/sec

Number of transaction commits per second waiting to be flushed from the log cache to the physical transaction log file on disk.

Equivalent PerfMon counter	SQLServer:Databases - Log Flush Waits/secg Bytes Flushed/sec
Explanation	A high number of log flush waits indicates that it is taking longer than normal to flush the transaction log cache to the file on disk, which will slow SQL Server's performance.
Guideline values	Log flush waits should be as close to 0 as possible.
Check also	Disk avg. write time. If this is greater than 5 ms, then an I/O bottleneck is likely to be occurring.
Possible solutions	Moving the location of your transaction log (LDF file) to a separate physical drive from your data files (MDF files) will increase performance, as writing to the log and data files will not be competing for access to the same disk. Also consider using faster disk drives, using RAID 10 for transaction log file storage, adding additional spindles to your existing RAID array, short-stroking, or adding additional disk controllers.

Availability group metrics

Log bytes received/sec

Shown on the Availability group overview as Received rate.

The rate at which the the secondary replica receives log records from the primary replica.

Equivalent PerfMon counter	SQLServer:Database Replica - Log Bytes Received/sec
Explanation	This value measures how quickly log records are received by the secondary replica.
Guideline values	To estimate how long a secondary replica will take to catch up with the primary replica, divide the log send queue by the rate of log bytes received.
Check also	Log send queue: Amount of log records in the primary database log file waiting to be sent to the secondary replica. Log bytes flushed/sec (on the secondary): Average size of the log flush per second.

Log send queue

Shown on the Availability group overview as Send queue.

Amount of log records in the primary database log file waiting to be sent to the secondary replica.

Equivalent PerfMon counter	SQLServer:Database Replica - Log Send Queue
Explanation	This value is the amount of log data that has been flushed to disk on the primary replica but hasn't yet been sent to the secondary replica.
Guideline values	The rate of log bytes flushed (on the primary replica) adds to the log send queue. If the rate of log bytes flushed on the primary is higher than the rate at which log records are sent, the log send queue will increase. To estimate the potential data loss on failover (in asynchronous-commit mode), divide the log send queue by the log bytes flushed. From the log send queue, the log records are sent to the secondary replica. To estimate how long it will take the secondary replica to catch up with the primary replica, divide the log send queue by the log bytes received/sec. If you use the secondary replica for reporting and want to know how much data from the primary replica is not yet represented in the secondary replica, add together the log send queue and the redo queue.
Check also	Log bytes flushed/sec: Average size of the log flush per second.

Redo queue

The amount of log records that needs to be redone (written to the secondary database) for the synchronous-commit primary and secondary replicas to be synchronized.

Equivalent PerfMon counter	SQLServer:Database Replica - Recovery Queue
Explanation	After the log data has been hardened into the secondary replica's log file, it joins the redo queue.
Guideline values	The rate of log bytes received adds to the size of the redo queue, while the redo rate takes away from it. If the rate of log bytes received is higher than the redo rate, the size of the redo queue will increase. In event of a failover, the redo queue needs to be cleared before the secondary replica can transition to the role of primary replica. To estimate the time, in seconds, that it will take the secondary replica to redo the log (and therefore how long failover will take), divide the redo queue by the redo rate. If you use the secondary replica for reporting and want to know how much data from the primary replica is not yet represented in the secondary replica, add together the log send queue and the redo queue.
Check also	Log send queue: Amount of log records in the primary database log file waiting to be sent to the secondary replica. Log bytes received/sec: The rate at which the secondary replica receives log records from the primary replica. Redo queue: The rate at which log records are redone in the secondary database to complete synchronization.

Redo rate

The rate at which log records are redone (written to the secondary database).

Equivalent PerfMon counter	SQLServer:Database Replica - Redone Bytes/sec
Explanation	Log records from the redo queue are written to the secondary database, completing the synchronization process. If the redo rate is lower than the rate of log bytes flushed on the secondary replica, the size of the redo queue will increase.
Guideline values	To estimate the time, in seconds, that it will take the secondary replica to redo the log, divide the redo queue by the redo rate.
Check also	Log bytes flushed/sec (on secondary replica): Average size of the log flush per second. Redo queue: The amount of log records that needs to be redone (written to the secondary database) for the synchronous-commit primary and secondary replicas to be synchronized.

Transaction delay ms/sec

This value measures the extent to which transactions are slowed down by synchronous replication.

Equivalent PerfMon counter	SQLServer:Database Replica - Transaction Delay
Explanation	In synchronous-commit mode, the primary replica doesn't commit any transactions until it receives acknowledgement that all synchronous secondary replicas have finished hardening the log, resulting in transaction delay. This is the total delay across all transactions caused by the primary replica waiting for commit acknowledgement from the secondary replicas. As such, it's useful for working out whether activity on the secondary replicas is slowing down the primary replica.
Guideline values	The flow control time contributes to the transaction delay on synchronous-commit replicas, because it affects the rate at which log records are sent. The rate of log bytes flushed on the secondary replica also affects the transaction delay.
Check also	Log bytes flushed/sec (on secondary replica): Average size of the log flush per second. Flow control time/sec: The total time that log records wait per second due to flow control.

Flow control time ms/sec

The total time that log records waited per second due to flow control.

Equivalent PerfMon counter	SQLServer:Availability Replica - Flow Control Time (ms/sec)
Explanation	Flow control throttles the flow of network traffic caused by replication, by limiting the speed at which the primary replica sends log records to the secondary replica. This helps to even out the usage of network and memory resources. This value shows the total time, in milliseconds per second, that the log records had to wait before being sent to the secondary replica, because of flow control.
Guideline values	Flow control can restrict the rate at which log records are sent from the primary replica. This will affect the log send queue: if the rate at which log records are sent is lower than the rate of log bytes flushed (on the primary replica), the log send queue will increase. The flow control time also contributes to the transaction delay on synchronous replicas, because it affects the rate at which log data is transferred between the primary and the secondary replicas.
Check also	Log bytes flushed/sec (on secondary replica): Average size of the log flush per second. Log send queue: Amount of log records in the primary database log file waiting to be sent to the secondary replica. Transaction delay: The extent to which transactions are slowed down by synchronous replication.

PostgreSQL instance and database metrics

Unlike SQL Server, all PostgreSQL metrics are available at both the instance and the database level. This is because PostgreSQL doesn't have the concept of metrics at an instance level; all metrics are sampled at the database level and are then aggregated to form the instance metric.

Transactions committed / sec

Shows the number of successful (committed) transactions per second.

Source	Aggregated from values in pg_stat_database & pg_database system tables.
Explanation	Sustained higher than normal volumes of transactions indicate an increase in demand.

Transactions rolled back / sec

Shows the number of aborted (rolled back) transactions per second.

Source	Aggregated from values in pg_stat_database & pg_database system tables.
Explanation	While some transaction roll backs are to be expected, a sudden increase or excessive number is a strong indicator of problems. Causes can include: performance issues causing queries to time out, errors in statements causing roll back to occur, applications deliberately rolling back transactions as part of their design, unreliable network connectivity.

Query overflow to temp data bytes / sec

Shows the volume of data being written to temporary storage to support query execution.

Source	Aggregated from values in pg_stat_database & pg_database system tables.
Explanation	High volumes of data written to disk may be a sign that PostgreSQL cannot suffciently resource memory for query operations. Particularly complex queries requiring multiple sorts on high volumes of data, host memory saturation or inadequate work_mem configuration could be responsible.

Number of connections

The total number of user connections to a PostgreSQL instance at a given time.

Source	From values in pg_stat_activity.
Explanation	This is not the same as active users, as connection pooling will affect the number of connections opened.

Block write time

The amount of time in the previous second that queries have spent waiting on I/O to write blocks of data.

Source	Aggregated from values in pg_stat_database & pg_database system tables.
Explanation	Collecting this metric requires the track_io_timing configuration to be set to true - if this metric is consistently zero, check to ensure this setting is enabled.

Block read time

The amount of time during the previous second that queries have spent waiting on I/O to read blocks of data.

Source	Aggregated from values in pg_stat_database & pg_database system tables.
Explanation	Collecting this metric requires the track_io_timing configuration to be set to true - if this metric is consistently zero, check to ensure this setting is enabled.

Deadlocks / sec

Shows the number of occurances where two or more transactions can't proceed because they are each waiting on a resource on which the other has already acquired a lock.

Source	Aggregated from values in pg_stat_database & pg_database system tables.
Explanation	Unusually high numbers of deadlocks can indicate a problem with application design.

Cache hit ratio

The ratio of data reads which were served from PostgreSQL's memory cache (a hit) vs having to request data from the host Operating System which often leads to reading data from disk (a miss).

Source

Aggregated from values in pg_stat_database & pg_database system tables.

Explanation

Because memory access is vastly faster than disk, a high proportion of reads coming from memory,
and hence a high cache hit ratio, is desirable. Most typical OLTP workloads should hope to see a cache
hit ratio well above 90%, and ideally close to 99%. For analytical workloads, lower than this is normal.

One caveat is that PostgreSQL uses a combination of its own memory cache (shared_buffers) as well as
the OS memory cache to store data in memory, caching data retrieved from disk in both locations, and
leading to some data being cached twice. When Postgres tries to access data, before resorting to disk
it looks first in both its own cache and the OS memory cache. However, the cache hit ratio measure only
treats data retrieved from PostgreSQL's own cache as a hit, with data retrieved from the OS cache being
treated as a miss, even though it is served from the same fast physical memory.

In some situations then, a low cache hit ratio may largely reflect that PostgreSQL has been configured
with a small allocation, controlled by the shared_buffers setting. A typical default is 128MB, but the
PostgreSQL documentation recommends a value between 25-40% of available memory to be more appropriate.
Looking at the Disk read bytes/sec metric, available for Linux, Windows, and RDS can help check this.

Once this possibility has been eliminated, a low cache hit ratio often indicates the server is starved
of memory, making it unable to store the most frequently-accessed data in the cache -
in this situation increasing memory can often lead to substantial performance gains.

Transactions committed / rolled back

Shows the ratio of the number of successful (committed) transactions at a given point in time to the number of aborted (rolled back) transactions.

Source

Aggregated from values in pg_stat_database & pg_database system tables.

Explanation

Typically, healthy systems have a high ratio of succesful to rolled back transactions. A low ratio can indicate poor system or application health.
In contrast to SQL Server where data manipulations have to be explicitly marked as a transaction, PostgreSQL considers all data manipulations a transaction with an implict Begin and End applied to every statement not wrapped in a wider transaction block.

Machine metrics

Machine: processor time

Total processor utilization (averaged across all processor cores) on the physical host Windows machine.

Equivalent PerfMon counter	Processor: % Processor Time (_Total)
Explanation	This is a general indicator of how busy all of your processors are. Each data point represents the average non-idle time for all processor cores since the previous data point; average utilization for each processor is totaled and then divided by the number of processors on the machine. Machine: processor time is collected and averaged every 15 seconds.
Guideline values	Ideally, Machine: processor time should average less than 50% for best SQL Server performance. If Machine: processor time exceeds an average 80% for a sustained time (five minutes or more), then a CPU bottleneck exists during this time, and you should investigate its cause. Keep in mind that short spikes in Machine: processor time above 80% are normal and expected, and only sustained high CPU utilization is a problem. If a spike indicates a plateau, not a sharp peak, this may indicate a slow running query is causing the CPU problem. Note: When running in a virtual environment, the Machine: processor time is not relevant because it is unable to accurately measure the CPU activity within a virtual machine. Instead, use the Hyper-V Logical Processor performance counters.
Check also	SQL Server processor time: In most dedicated SQL Server machines, there is a high correlation between Machine: processor time and SQL Server: processor time. If the Machine: processor time is much higher than the SQL Server: processor time, then some process, other than SQL Server is causing the high CPU utilization, and this process should be investigated. Avg. CPU queue length: This measures the number of threads in the processor queue. There is a single queue for processor time even on computers with multiple cores. If a computer has multiple processor cores, SQL Monitor divides this value by the number of processor cores servicing the workload. A sustained processor queue of less than 10 threads per processor is normally acceptable, depending on of the workload. A number exceeding 10 threads per processor indicates a potential CPU bottleneck.
Possible solutions	Assuming the Machine: processor time is due to the SQL Server process, reducing CPU utilization can be accomplished many different ways, including rewriting poorly written queries, reducing query plan recompilation, optimizing query parallelism, and avoiding using client-side cursors. If correcting the above does not resolve CPU bottlenecks, consider upgrading to faster processors or increasing the number of processors. If the Machine: processor time is high, and the SQL Server: processor time is not highly correlated, then identify the non-SQL Server process causing the excessive CPU activity and correct it.
More information	Running SQL Server 2008 in a Hyper-V Environment Performance Tuning Guidelines for Windows Server 2008 Troubleshooting Performance Problems in SQL Server 2008

Avg. CPU queue length

Measures the number of threads in the processor queue waiting to be executed. The queue contains threads that are ready to run, but cannot, because all of the processor cores are busy.

Equivalent PerfMon counter	System: Processor Queue Length (divided by the number of processor cores)
Explanation	There is a single process queue, even on computers with multiple cores. Therefore, if a computer has multiple processor cores, SQL Monitor automatically divides the processor queue length by the number of processor cores servicing the workload to determine the average processor queue length.
Guideline values	The lower the number, the better, but an average processor queue length of less than 10 is normally acceptable, depending on the workload. A number exceeding 10 threads per processor indicates a CPU bottleneck. Occasional spikes in the average processor queue length over 10 are normal, but only sustained spikes in the average processor queue length are a problem.
Check also	Machine: processor time for the host machine. If the average process queue length is regularly greater than 10, and Machine: processor time regularly exceeds 80%, then your machine has a CPU bottleneck.
Possible solutions	If a CPU bottleneck is confirmed, reducing CPU utilization can be accomplished in many different ways, including rewriting poorly written queries, reducing query plan recompilation, optimizing query parallelism, and avoiding using client-side cursors. If correcting the above does not resolve CPU bottlenecks, consider upgrading to faster processors or increasing the number of processors. If the Machine: processor time is high, and the SQL Server: processor time is not highly correlated, then identify the non-SQL Server process causing the excessive CPU activity and correct it.
More information	Troubleshooting Performance Problems in SQL Server 2008

Machine: memory used

Displays the amount of physical memory (in bytes) used on the server.

Equivalent PerfMon counter	(WMI: Win32_ComputerSystem.TotalPhysicalMemory) - (Memory: Available Bytes)
Explanation	This is the total number of bytes of memory currently used by all processes. If applications on the server are running slow, and data access is also slow, you may want to check the values for this metric.
Guideline values	High values may suggest a computer memory shortage or indicate that the computer or an application is not releasing memory. The optimal value depends on how much physical memory the machine has. For example, on a 6 GB machine, you may allow 2 GB for your operating system and assign the remaining RAM to your SQL Server.
Check also	Memory pages/sec. This indicates whether memory shortages may be caused by excessive paging.

Memory pages/sec

Measures the rate at which pages are read from or written to disk to resolve page faults.

Equivalent PerfMon counter	Memory: Pages/sec
Explanation	This is the sum of the memory pages input/sec and memory pages output/sec. It highlights the number of pages that were retrieved from disk because of hard page faults, or were written to disk to free space in the working set because of page faults.
Guideline values	Extreme variations in Memory pages/sec are very common, especially when operations such as backups, restores, imports, or exports are performed. If Memory pages/sec exceeds 1000, and if the number of bytes available is less than 100 MB on a consistent basis, this is a strong indication of memory pressure.
Check also	Machine: memory used and PerfMon counter Process: working set to calculate the amount of physical RAM available to SQL Server that is not currently being used by other processes. If the number of bytes available is less than 100 MB, and Memory pages/sec are high, the server is under significant memory pressure.
Possible solutions	Remove unnecessary applications and services from your server, add more physical RAM.

Network utilization

The percentage of total bandwidth usage to help identify whether your network adapter is at saturation point.

Equivalent PerfMon counter	8 * ((Network Interface: Bytes Received/sec) + (Network Interface: Bytes Sent/sec)) / (Network Interface: Current Bandwidth) *100
Explanation	Indicates the total amount of bandwidth being used on the connection to which the network card is attached. It is useful to determine when the network is idle, normal or busy. Note: This does not measure the amount of bandwidth being sent to and from the server. The value is calculated by adding together the rates at which bytes are sent and received per second over the network adapter and then converting them to bits. This value is then compared with the current bandwidth on the network adapter to calculate Network utilization.
Guideline values	This value is most useful when there is a single SQL Server instance on a host machine with no other network traffic. The ideal percentage value depends on the type of network connection you are using, so a baseline should be set against which efficiency can be measured. On average, more than 50% network utilization is high and more than 80% network utilization is very high. For optimum performance, you should have a network connection to a dedicated switch port running at full duplex.
Check also	During busy periods, further investigation into the network-related processes running at the time may reveal which ones are placing significant load on network bandwidth. To do this, open Windows Task Manager and select Resource Monitor from the Performance tab.
Possible solutions	Turn off power saving mode on network cards, manually set the speed and duplex of your network card instead of having it set automatically for you, use a faster network card, add additional network cards.

Disk used

The amount of space used for each disk (in GB).

Equivalent PerfMon counter

((denominator of LogicalDisk: % Free Space) - (LogicalDisk: Free Megabytes))

Explanation

This information allows you to monitor how much storage space is being used and how much is available on each disk. It is important to evaluate the adequacy of your system's disk space to avoid programs failing because the system is unable to allocate space, and to prevent constraints on the growth of page filing to support virtual memory.

Guideline values

The amount of disk capacity you need to ensure your system disks are adequately supported can be calculated as follows:

Allow 1 GB for an operating system.
Add the size of all applications.
Add at least twice the size of system memory for the paging file.
Add the disk space estimate for each user multiplied by the number of users.
Multiply by at least 130% to take into account room for expansion.

As a general rule, if the disk space used is 80% of capacity or greater, you may consider running a Disk Cleanup tool, compressing or defragmenting the disk, transferring certain files to other disks or using remote storage.

Disk avg. read time

The average time in milliseconds of a read operation from the logical disk.

Equivalent PerfMon counter	Logical Disk: Avg. Disk sec/Read
Explanation	Measures logical disk latency (how fast a disk responds) when data is read from it. The lower the latency, the faster the I/O performance, and vice versa. This metric, and the Disk avg. write time metric, are two of the most important indicators of I/O bottlenecks.
Guideline values	For drives with MDF and NDF files and OLTP loads, average read latency should ideally be below 20 ms. For drives with OLAP loads, then latency up to 30 ms is considered acceptable. For drives with LDF files, latency should ideally be 5 ms or less. In general, anything above 50 ms is slow and suggests a potentially serious I/O bottleneck. Write latency will vary considerably over time, so these guideline figures ignore occasional spikes.
Check also	Disk avg. write time. The values should correspond with the guidelines described above.
Possible solutions	Tune queries to reduce the resources used to execute them, use faster disk drives, use RAID 10, use more spindles inside RAID arrays, use faster controllers, use multiple controllers, use short-stroking, avoid using SANs where your data files are shared on arrays with other applications. MDF and NDF files should always be isolated from LDF files. Ideally, each LDF file should be on its own array for maximum performance.

Disk avg. write time

The average time, in milliseconds, of a write of data from the logical disk.

Equivalent PerfMon counter	Logical Disk: Avg. Disk sec/Write
Explanation	Measures logical disk latency (how fast a disk responds) when data is written to it. The lower the latency, the faster the I/O performance, and vice versa. This metric, and the Disk avg. read/sec metric, are two of the most important indicators of I/O bottlenecks.
Guideline values	For drives with MDF and NDF files and OLTP loads, average write latency should ideally be below 20 ms. For drives with OLAP loads, then latency up to 30 ms is considered acceptable. For drives with LDF files, latency should ideally be 5 ms or less. In general, anything above 50 ms is slow and suggests a potentially serious I/O bottleneck. Write latency will vary considerably over time, so these guideline figures ignore occasional spikes.
Check also	Disk avg. read time. The values should correspond with the guidelines described above.
Possible solutions	Tune queries to reduce the resources used to execute them, use faster disk drives, use RAID 10, use more spindles inside RAID arrays, use faster controllers, use multiple controllers, use short-stroking, avoid using SANs where your data files are shared on arrays with other applications. MDF and NDF files should always be isolated from LDF files. Ideally, each LDF file should be on its own array for maximum performance.

Disk transfers/sec

Measures how many read and write requests are being completed per second on the logical disk.

Equivalent PerfMon counter	Logical Disk: Disk Transfers/sec
Explanation	This metric helps to determine the number of IOPS (I/Os per second) occurring on the logical disk, and whether your SQL Server applications are creating more I/O than can be handled by the disk subsystem.
Guideline values	The Disk Transfers/sec should not exceed the IOPS capacity of your disk subsystem, otherwise there will be an I/O bottleneck.
Check also	Disk avg. read time and Disk avg. write time. If these metrics are not within their recommended thresholds, then most likely your disk subsystem does not have the IOPS capacity it needs to keep up with the current application workload.
Possible solutions	The first thing you should do is determine the IOPS capacity of your disk system. Once you know what it is, then you will be in a position to know if you need to do anything to help boost the IOPS capacity of your disk subsystem, such as tune queries to reduce the resources used to execute them, use faster disk drives, use RAID 10, use more spindles inside RAID arrays, use faster controllers, use multiple controllers, use short-stroking, avoid using SANs where your data files are shared on arrays with other applications. MDF and NDF files should always be isolated from LDF files. Ideally, each LDF file should be on its own array for maximum performance.
More information	Analyzing Storage Performance

Avg. disk queue length

Measures the pressure on the physical hard disk array.

Equivalent PerfMon counter	Logical Disk: Avg. Disk Queue Length
Explanation	This is the average number of physical read and write requests queued for a physical disk.
Guideline values	Due to changes in technology, such as virtualization, disk and controller technology, SANs and more, this counter is no longer a good indicator of I/O bottlenecks. A better measure of I/O bottlenecks are Disk avg. read time and Disk avg. write time.
Check also	Disk avg. read time and Disk avg. write time.

SQL Monitor 13

List of metrics

SQL Server metrics

User connections

SQL Server: total memory

SQL Server: target memory

SQL Server: free memory

Batch requests/sec

Compilations/sec

Compilations/batch

Recompilations/sec

Buffer cache hit ratio

Buffer page life expectancy

Full scans/sec

Page splits/sec

Page splits/batch request

Latch wait time

Lock timeouts/sec

Lock waits/sec

Avg. lock wait time

SQL Server database metrics

Transactions/sec

Active transactions

Total size

Data size

Log size

Log space used

Log bytes flushed/sec

Log flushes/sec

Log flush waits/sec

Availability group metrics

Log bytes received/sec

Log send queue

Redo queue

Redo rate

Transaction delay ms/sec

Flow control time ms/sec

PostgreSQL instance and database metrics

Transactions committed / sec

Transactions rolled back / sec

Query overflow to temp data bytes / sec

Number of connections

Block write time

Block read time

Deadlocks / sec

Cache hit ratio

Transactions committed / rolled back

Machine metrics

Machine: processor time

Avg. CPU queue length

Machine: memory used

Memory pages/sec

Network utilization

Disk used

Disk avg. read time

Disk avg. write time

Disk transfers/sec

Avg. disk queue length