List of metrics
Published 08 March 2018
Machine metrics
Machine: processor time
Total processor utilization (averaged across all processor cores) on the physical host Windows machine.
Equivalent PerfMon counter | Processor: % Processor Time (_Total) |
Explanation | This is a general indicator of how busy all of your processors are. Each data point represents the average non-idle time for all processor cores since the previous data point; average utilization for each processor is totaled and then divided by the number of processors on the machine. Machine: processor time is collected and averaged every 15 seconds. |
Guideline values | Ideally, Machine: processor time should average less than 50% for best SQL Server performance. If Machine: processor time exceeds an average 80% for a sustained time (five minutes or more), then a CPU bottleneck exists during this time, and you should investigate its cause. Keep in mind that short spikes in Machine: processor time above 80% are normal and expected, and only sustained high CPU utilization is a problem. If a spike indicates a plateau, not a sharp peak, this may indicate a slow running query is causing the CPU problem. Note: When running in a virtual environment, the Machine: processor time is not relevant because it is unable to accurately measure the CPU activity within a virtual machine. Instead, use the Hyper-V Logical Processor performance counters. |
Check also |
|
Possible solutions | Assuming the Machine: processor time is due to the SQL Server process, reducing CPU utilization can be accomplished many different ways, including rewriting poorly written queries, reducing query plan recompilation, optimizing query parallelism, and avoiding using client-side cursors. If correcting the above does not resolve CPU bottlenecks, consider upgrading to faster processors or increasing the number of processors. If the Machine: processor time is high, and the SQL Server: processor time is not highly correlated, then identify the non-SQL Server process causing the excessive CPU activity and correct it. |
More information | Running SQL Server 2008 in a Hyper-V Environment |
Avg. CPU queue length
Measures the number of threads in the processor queue waiting to be executed. The queue contains threads that are ready to run but cannot because all of the processor cores are busy.
Equivalent PerfMon counter | System: Processor Queue Length (divided by the number of processor cores) |
Explanation | There is a single process queue, even on computers with multiple cores. Therefore, if a computer has multiple processor cores, SQL Monitor automatically divides the processor queue length by the number of processor cores servicing the workload to determine the average processor queue length. |
Guideline values | The lower the number, the better, but an average processor queue length of less than 10 is normally acceptable, dependent on the workload. A number exceeding 10 threads per processor indicates a CPU bottleneck. Occasional spikes in the average processor queue length over 10 are normal, but only sustained spikes in the average processor queue length are a problem. |
Check also | Machine: processor time for the host machine. If the average process queue length is regularly greater than 10, andMachine: processor time regularly exceeds 80%, then your machine has a CPU bottleneck. |
Possible solutions | If a CPU bottleneck is confirmed, reducing CPU utilization can be accomplished in many different ways, including rewriting poorly written queries, reducing query plan recompilation, optimizing query parallelism, and avoiding using client-side cursors. If correcting the above does not resolve CPU bottlenecks, consider upgrading to faster processors or increasing the number of processors. If the Machine: processor time is high, and the SQL Server: processor time is not highly correlated, then identify the non-SQL Server process causing the excessive CPU activity and correct it. |
More information |
Machine: memory used
Displays the amount of physical memory (in bytes) used on the server.
Equivalent PerfMon counter | (WMI: Win32_ComputerSystem.TotalPhysicalMemory) - (Memory: Available Bytes) |
Explanation | This is the total number of bytes of memory currently used by all processes. If applications on the server are running slow, and data access is also slow, you may want to check the values for this metric. |
Guideline values | High values may suggest a computer memory shortage or indicate that the computer or an application is not releasing memory. The optimal value depends on how much physical memory the machine has. For example, on a 6 GB machine, you may allow 2 GB for your operating system and assign the remaining RAM to your SQL Server. |
Check also | Memory pages/sec. This indicates whether memory shortages may be caused by excessive paging. |
Memory pages/sec
Measures the rate at which pages are read from or written to disk to resolve page faults.
Equivalent PerfMon counter | Memory: Pages/sec |
Explanation | This is the sum of the memory pages input/sec and memory pages output/sec. It highlights the number of pages that were retrieved from disk because of hard page faults, or were written to disk to free space in the working set because of page faults. |
Guideline values | Extreme variations in Memory pages/sec are very common, especially when operations such as backups, restores, imports, or exports are performed. If Memory pages/sec exceeds 1000, and if the number of bytes available is less than 100 MB on a consistent basis, this is a strong indication of memory pressure. |
Check also | Machine: memory used and PerfMon counter Process: working set to calculate the amount of physical RAM available to SQL Server that is not currently being used by other processes. If the number of bytes available is less than 100 MB, andMemory pages/sec are high, the server is under significant memory pressure. |
Possible solutions | Remove unnecessary applications and services from your server, add more physical RAM. |
Network utilization
The percentage of total bandwidth usage to help identify whether your network adapter is at saturation point.
Equivalent PerfMon counter | 8 * ((Network Interface: Bytes Received/sec) + (Network Interface: Bytes Sent/sec)) / (Network Interface: Current Bandwidth) *100 |
Explanation | Indicates the total amount of bandwidth being used on the connection to which the network card is attached. It is useful to determine when the network is idle, normal or busy. Note: This does not measure the amount of bandwidth being sent to and from the server. The value is calculated by adding together the rates at which bytes are sent and received per second over the network adapter and then converting them to bits. This value is then compared with the current bandwidth on the network adapter to calculate Network utilization. |
Guideline values | This value is most useful when there is a single SQL Server instance on a host machine with no other network traffic. The ideal percentage value depends on the type of network connection you are using, so a baseline should be set against which efficiency can be measured. On average, more than 50% network utilization is high and more than 80% network utilization is very high. For optimum performance, you should have a network connection to a dedicated switch port running at full duplex. |
Check also | During busy periods, further investigation into the network-related processes running at the time may reveal which ones are placing significant load on network bandwidth. To do this, open Windows Task Manager and select Resource Monitor from the Performance tab. |
Possible solutions | Turn off power savings mode on network cards, manually set the speed and duplex of your network card instead of having it set automatically for you, use a faster network card, add additional network cards. |
Disk used
The amount of space used for each disk (in GB).
Equivalent PerfMon counter | ((denominator of LogicalDisk: % Free Space) - (LogicalDisk: Free Megabytes)) |
Explanation | This information allows you to monitor how much storage space is being used and how much is available on each disk. It is important to evaluate the adequacy of your system's disk space to avoid programs failing because the system is unable to allocate space, and to prevent constraints on the growth of page filing to support virtual memory. |
Guideline values | The amount of disk capacity you need to ensure your system disks are adequately supported can be calculated as follows:
As a general rule, if the disk space used is 80% of capacity or greater, you may consider running a Disk Cleanup tool, compressing or defragmenting the disk, transferring certain files to other disks or using remote storage. |
Disk avg. read time
The average time in milliseconds of a read operation from the logical disk.
Equivalent PerfMon counter | Logical Disk: Avg. Disk sec/Read |
Explanation | Measures logical disk latency (how fast a disk responds) when data is read from it. The lower the latency, the faster the I/O performance, and vice versa. This metric, and the Disk avg. write time metric, are two of the most important indicators of I/O bottlenecks. |
Guideline values | For drives with MDF and NDF files and OLTP loads, average read latency should ideally be below 20 ms. For drives with OLAP loads, then latency up to 30 ms is considered acceptable. For drives with LDF files, latency should ideally be 5 ms or less. In general, anything above 50 ms is slow and suggests a potentially serious I/O bottleneck. Write latency will vary considerably over time, so these guideline figures ignore occasional spikes. |
Check also | Disk avg. write time. The values should correspond with the guidelines described above. |
Possible solutions | Tune queries to reduce the resources used to execute them, use faster disk drives, use RAID 10, use more spindles inside RAID arrays, use faster controllers, use multiple controllers, use short-stroking, avoid using SANs where your data files are shared on arrays with other applications. MDF and NDF files should always be isolated from LDF files. Ideally, each LDF file should be on its own array for maximum performance. |
Disk avg. write time
The average time, in milliseconds, of a write of data from the logical disk.
Equivalent PerfMon counter | Logical Disk: Avg. Disk sec/Write |
Explanation | Measures logical disk latency (how fast a disk responds) when data is written to it. The lower the latency, the faster the I/O performance, and vice versa. This metric, and the Disk avg. read/sec metric, are two of the most important indicators of I/O bottlenecks. |
Guideline values | For drives with MDF and NDF files and OLTP loads, average write latency should ideally be below 20 ms. For drives with OLAP loads, then latency up to 30 ms is considered acceptable. For drives with LDF files, latency should ideally be 5 ms or less. In general, anything above 50 ms is slow and suggests a potentially serious I/O bottleneck. Write latency will vary considerably over time, so these guideline figures ignore occasional spikes. |
Check also | Disk avg. read time. The values should correspond with the guidelines described above. |
Possible solutions | Tune queries to reduce the resources used to execute them, use faster disk drives, use RAID 10, use more spindles inside RAID arrays, use faster controllers, use multiple controllers, use short-stroking, avoid using SANs where your data files are shared on arrays with other applications. MDF and NDF files should always be isolated from LDF files. Ideally, each LDF file should be on its own array for maximum performance. |
Disk transfers/sec
Measures how many read and write requests are being completed per second on the logical disk.
Equivalent PerfMon counter | Logical Disk: Disk Transfers/sec |
Explanation | This metric helps to determine the number of IOPS (IOs per second) occurring on the logical disk, and whether your SQL Server applications are creating more I/O that can be handled by the disk subsystem. |
Guideline values | The Disk Transfers/sec should not exceed the IOPS capacity of your disk subsystem, otherwise there will be an I/O bottleneck. |
Check also | Disk avg. read time and Disk avg. write time. If these metrics are not within their recommended thresholds, then most likely your disk subsystem does not have the IOPS capacity it needs to keep up with the current application workload. |
Possible solutions | The first thing you should do is determine the IOPS capacity of your disk system. Once you know what it is, then you will be in a position to know if you need to do anything to help boost the IOPS capacity of your disk subsystem, such as tune queries to reduce the resources used to execute them, use faster disk drives, use RAID 10, use more spindles inside RAID arrays, use faster controllers, use multiple controllers, use short-stroking, avoid using SANs where your data files are shared on arrays with other applications. MDF and NDF files should always be isolated from LDF files. Ideally, each LDF file should be on its own array for maximum performance. |
More information | Analyzing Storage Performance |
Avg. disk queue length
Measures the pressure on the physical hard disk array.
Equivalent PerfMon counter | Logical Disk: Avg. Disk Queue Length |
Explanation | This is the average number of physical read and write requests queued for a physical disk. |
Guideline values | Due to changes in technology, such as virtualization, disk and controller technology, SANs and more, this counter is no longer a good indicator of I/O bottlenecks. A better measure of I/O bottlenecks is Avg. disk read time and Avg. disk write time. |
Check also | Disk avg. read time and Disk avg. write time. |
SQL Server metrics
User connections
The total number of user connections to a SQL Server instance at a given time.
Equivalent PerfMon counter | SQLServer: General Statistics: User Connections |
Explanation | This value provides a general guideline of how busy the SQL Server is. Every connection to a SQL Server instance requires on average 28 KB of memory overhead, even if the user is not currently using the connection. You can adjust the settings for your SQL Server instance so that the maximum number of user connections permitted is determined automatically by identifying the number of users currently logged on. Generally speaking, there is a correlation between the number of user connections and Batch requests/sec. Note: The user connections value is not the same as the number of users. A single user may have multiple connections open, or multiple users may share a single connection. |
Guideline values | It is useful to monitor this value over time to indicate increases or decreases in usage of your SQL Server. The number of worker threads is calculated dynamically based on the type of installation and number of processors available to SQL Server:
For more information on worker thread counts, see Configure the max worker threads Server Configuration Option. Note: The number of connections on a server can easily be three or four times the number of worker threads if connection pooling is being used. Connections are pooled per application domain, per connection string, per server, and not every connection will be executing at exactly the same time. |
SQL Server: processor time
The processor utilization (%) for the SQL Server service running on the host Windows machine.
Equivalent PerfMon counter | Process(sqlservr): % Processor Time |
Explanation | This value tells you what percentage of processor time is being used to run the sqlservr process (the database engine). It excludes other SQL Server-related processes, such as the SQL Server Agent, SSIS, Reporting Services, Analysis Services, and Full Text Search. Because SQL Server: processor time is a subset of Machine: processor time, the SQL Server: processor time value will be lower. Note: On multiprocessor computers, it is common for SQL Server to use more than the equivalent of 100% processor time on one processor. This is because its threads may be using multiple processors. |
Guideline values | Ideally, SQL Server: processor time should average less than 50% for best SQL Server performance. If it exceeds an average 80% for a sustained time (five minutes or more), then a CPU bottleneck exists during this time, and you should investigate its cause. Short spikes in SQL Server: processor time above 80% are normal and expected, and only sustained high CPU utilization is a problem. If a spike indicates a plateau, not a sharp peak, this may indicate a slow running query is causing the CPU problem. Generally, SQL Server: processor time should be the busiest process on a host system, and will correlate closely with Machine: processor time. If SQL Server: processor time is substantially smaller than Machine: processor time, this may indicate that some other process running on the host system is using CPU, potentially using CPU resources that could be used by the Database Engine in order to enhance its performance. If this is the case, identify which processes are using Task Manager in order to diagnose what is happening on your host system. |
Check also | Machine: processer time where the average utilization for each processor is totaled and then divided by the number of processors on the machine. If this average is 80% or more, or if you consistently record averages greater than 50%, your processors may need upgrading. Also, Avg. CPU queue length provides a stronger indication of whether your processor is running slowly. |
SQL Server: total memory
The total amount of dynamic memory the server is currently using.
Equivalent PerfMon counter | SQLServer:Memory Manager - Total Server Memory (KB) |
Explanation | Amount of physical memory the mssqlserver service is currently using; includes the total of the buffers committed to the SQL Server BPool and "OS in Use" type buffers. |
Guideline values | If SQL Server: total memory is relatively high compared to the amount of physical memory in the computer, it may indicate that more physical memory is required. |
Check also |
|
Possible solutions | Remove unnecessary applications and services from your server, and add more physical RAM. |
SQL Server: target memory
Maximum amount of dynamic memory (the buffer pool) that SQL Server is prepared to use.
Equivalent PerfMon counter | SQLServer:Memory Manager - Target Server Memory (KB) |
Explanation | Displays the maximum amount of dynamic memory (the buffer pool) that SQL Server is prepared to use. This value cannot exceed the max server memory setting. When SQL Server first starts, this value corresponds to the buffers reserved at SQL Server startup. The default memory management behavior of the Microsoft SQL Server Database Engine is to acquire as much memory as it needs without creating a memory shortage on the system. If SQL Server needs more physical RAM after startup, and if Mbytes are available, then SQL Server will grab it and SQL Server: target memory will increase. If the OS requests RAM back from SQL Server, then SQL Server: target memory will usually decrease, which in some cases can cause SQL Server memory pressure, hurting SQL Server's performance. |
Guideline values | If this value is high (relative to the total physical memory available on your machine), then you should consider adding extra physical memory, to improve the performance of SQL Server. |
Check also |
|
Possible solutions | Remove unnecessary applications and services from your server, and add more physical RAM. |
SQL Server: free memory
The total amount of dynamic memory that the server is not using.
Note: This metric was introduced in SQL Server 2012 to replace Buffer free pages (which was used in SQL Server 2008 and earlier). For SQL Server 2008 and earlier, we've now converted free pages into memory (1 page is equal to 8 KB).
Equivalent PerfMon counter | SQLServer: Memory Manager - Free memory |
Explanation | The Memory Manager monitors the overall memory usage of SQL Server. This metric can help you identify whether there are bottlenecks resulting from a lack of memory (which means that SQL Server has to retrieve data from disk). It can also help you to identify whether you can improve query performance by increasing the amount of memory. |
Guideline values | A free memory value greater than 5 MB should ensure that the buffer cache can handle sufficient new memory allocation requests. A value less than 5 MB may mean that requests are stalled and this can affect performance. You may consider adding memory or making more memory available to the data cache. |
Check also | Buffer cache hit ratio: If this value is below 90% and your free memory value is less than 5 MB, you may be experiencing memory pressure and additional RAM may be required. |
Batch requests/sec
Number of T-SQL batch requests received by SQL Server per second, averaged since last collection time.
Equivalent PerfMon counter | SQLServer:SQL Statistics - Batch Requests/sec |
Explanation | This gives a general indicator of how busy SQL Server is. It measures all of the batches you send to the SQL Server. The number of batch requests SQL Server can handle is dependent on the capability of your hardware. It may be useful to track this value over time, to gauge the scalability of your system, and to identify when you have peaks and troughs in user requests. Batch requests/sec only tracks the number of batches, not the statements in the batch. Some batches may use very little resources, while other batches use huge amounts of resources. Each batch is different, and this metric only provides an absolute count of the batches executed by SQL Server. |
Guideline values | In general, over 1000 batch requests per second indicates a very busy SQL Server, and could highlight the potential for a CPU bottleneck. More powerful hardware can of course handle higher numbers of requests. |
Check also | Compilations/batch and Compilations/sec. These values should be as low as possible, as high values may indicate a lot of ad-hoc queries being run. You may need to optimize T-SQL. |
Compilations/sec
SQL compilations per second; each data point shows the average value since the previous data point.
Equivalent PerfMon counter | SQLServer:SQL Statistics - SQL Compilations/sec |
Explanation | The number of times that Transact-SQL compilations occur, per second (including recompiles). The lower the value the better. Each compilation uses CPU and other resources, so SQL Server will attempt to reuse cached execution plans wherever possible. Not having to recompile the stored procedure reduces the overhead on the server and speeds up overall performance. |
Guideline values | In general, Compilations/sec should be less than 10% of the Batch requests/sec. High values often indicate excessive ad-hoc querying and should be as low as possible. |
Check also | Batch requests/sec. |
Possible solutions | Rewrite ad-hoc queries as stored procedures. Reduce usage of temp tables, or combine statements to eliminate them in stored procedures. Check "SET" statements - changes in session settings such as ANSI_PADDING or ANSI_NULLS can cause a query to be recompiled. If SQL Compilations/sec are excessive in SQL Server 2005 or later, consider using "Forced parameterization" on the database. This tells SQL Server to force the parameterization of virtually all SELECT, INSERT, UPDATE and DELETE statements, requiring fewer compilations of statements due to re-use of more query plans. |
Compilations/batch
SQL compilations per batch requests received by SQL Server.
Equivalent PerfMon counter | (SQLServer:SQL Statistics - SQL Compilations/sec) / (SQLServer:SQL Statistics - Batch Requests/sec) |
Explanation | This should provide a general indication of how quickly and efficiently SQL Server query optimizer is processing user queries. |
Guideline values | As a general rule, Compilations/sec should be less than 10% of the Batch requests/sec. High values often indicate excessive adhoc querying and should be as low as possible. |
Check also | Compilations/sec and Batch requests/sec. |
Recompilations/sec
Number of times, per second, that Transact-SQL objects attempted to be executed had to be recompiled.
Equivalent PerfMon counter | SQLServer:SQL Statistics - Re-Compilations/sec |
Explanation | Usually, once an object is compiled, it does not have to be recompiled. When a stored procedure or query is recompiled, a compile lock is placed on the objects referenced by the procedure, and database blocking may occur. Excessive recompilations can cause serious performance problems or may compile locks that are incompatible with any known locking type. |
Guideline values | Recompilations/sec should be as small as possible. Any value greater than 10% of Compilations/sec can indicate a problem. |
Check also | Compilations/sec, Batch requests/sec, and Compilations/batch requests to determine how quickly and efficiently SQL Server and the query optimizer are processing user queries. Check also SP:Recompile and SQL:StmtRecompile event classes in a Profiler trace to identify which stored procedures and SQL statements need recompiling with every run. The EventSubClass data column tells you what caused the recompile. |
Buffer cache hit ratio
The percentage of pages that were found in the buffer pool without having to read from disk.
Equivalent PerfMon counter | SQLServer: Buffer Manager - Buffer cache hit ratio |
Explanation | Indicates how frequently a query gets information from cache instead of accessing the hard disk. The value is calculated by dividing the total number of cache hits by the total number of cache lookups. This figure is an average value calculated since the SQL Server service was last restarted, not a snapshot of what is currently happening on your server now. Because of this, be careful how you interpret this value. |
Guideline values | If your server is running online transaction processing (OLTP) applications, a value of 99% or higher is ideal, but anything above 90% is generally considered satisfactory. A value of 90% or lower may indicate increased I/O access and slower performance. Adding more physical RAM may help alleviate this problem. If your server is running online analytical processing (OLAP) applications, it is not uncharacteristic for this value to fall below 90%, which is not normally a problem, unless you see other signs of memory pressure. Note: Be aware that the value of this metric immediately after a service restart will be uncharacteristically low until enough time has passed for the data cache to fill. |
Check also | Buffer page life expectancy: Measures the average number of seconds a page will stay in the buffer cache without being accessed. Generally, if his value drops below 300 seconds, this may indicate SQL Server is under memory pressure. |
Buffer page life expectancy
Indicates the average number of seconds a page will stay in the buffer pool without being referenced.
Equivalent PerfMon counter | SQLServer: Buffer Manager - Page life expectancy |
Explanation | Data is cached in the buffer pool so that it can be retrieved much faster than having to retrieve it from disk. However, once data pages have been stored in the buffer pool, they may or may not be reused. If a page is not referenced after a given amount of time, that page will be removed from the buffer pool so other data can use the space. A high buffer page life expectancy indicates that buffer pages are being reused often, which in turn indicates that the SQL Server is not under any memory pressure. |
Guideline values | This value depends on the amount of RAM installed on the servers. It is best practice to record a baseline measurement against which you can compare the monitored values over time. Do this by running the formula (DataCacheSizeInGB/4GB*300) as recommended in Finding what queries in the plan cache use a specific index. For details on how to find the data cache size, see Performance issues from wasted buffer pool memory. If data pages consistently remain in the buffer for less than your baseline value, this may indicate that there is insufficient memory on the system or not enough memory is configured for SQL Server use. You may consider improving performance by adding RAM. The problem may also be caused by missing indexes, poor index design or badly written application code, such as too many stored procedures being recompiled, or queries reading large tables in the buffer pool when only a small number of rows need changing. Addressing application-specific problems may improve performance. |
Check also | Buffer cache hit ratio. If this value is consistently low or drops suddenly, memory bottleneck may be the most likely cause. |
Full scans/sec
Measures the number of base table or full index scans occuring per second.
Equivalent PerfMon counter | SQLServer:Access Methods: Full Scans/sec |
Explanation | This value monitors how many full scans are performed on tables or indexes. While index scans are a normal part of how SQL Server works, a lot of full scans may indicate missing indexes, accessing many small tables, or queries that return large quantities of rows. |
Guideline values | It is best practice to record a baseline measurement against which you can compare your monitored values. If the monitored value is significantly higher when compared with the baseline, and CPU is also high, it may suggest performance bottleneck. The ratio of Index searches/sec to Full scans/sec should generally be no more than 1000:1. |
Check also | Machine: processor time to establish whether CPU is also high. Also look at Batch requests/sec against your baseline measurement for details of how well the database is coping with queries. You can then check the Lock waits/sec and Lock timeouts/sec to determine whether locking issues are affecting performance. |
Page splits/sec
The number of page splits, and new page allocations, in a SQL Server table per second.
Equivalent PerfMon counter | SQLServer:Access Methods: Page Splits/sec |
Explanation | As new data is inserted or added to an existing data page, and the page does not have enough room for all the new data to fit, SQL Server splits the page, leaving half of the data on the old page and putting the other half on a new page. This is a normal SQL Server event, but excessive page splitting can result in index fragmentation and a variety of related performance problems. |
Guideline values | It is best practice to record a baseline against which you can compare your monitored values. In general terms, this value should not be greater than 20% of the number of Batch requests/sec. A low value is preferable, but what constitutes low depends on a number of variable factors including the number of users, the level of user activity and your I/O subsystem performance capabilities. |
Check also | Disk avg. read time and Disk avg. write time. There is often a correlation between excessive I/O activity and excessive page splitting. |
Possible solutions | Resolving page splitting problems is often difficult as the original database design may be contributing to the problem. Besides reconsidering the database design, also consider reevaluating your indexing strategy, and perhaps increase the fillfactor of the indexes being affected by excessive page splitting. If excessive page splitting cannot be effectively eliminated, consider increasing the frequency in which you rebuild or reorganize your indexes in order to remove the fragmentation created by page splitting. |
Page splits/batch request
The number of page splits per second in a SQL Server table divided by the number of T-SQL command batch requests received per second.
Equivalent PerfMon counter | (SQLServer:Access Methods: Page Splits/sec) / (SQLServer:SQL Statistics - Batch Requests/sec) |
Explanation | While pages splits are a normal SQL Server function, excessive page splitting can negatively affect SQL Server performance. |
Guideline values | On average, the number of page splits per batch request should be below 20% of the number of batch requests per second, but ideally as close to 0 as possible. |
Check also | The fill factor of the tables in your indexes. The lower the fill factor, the more space, which helps reduce potential page splits. For more information about fill factor, see the MSDN documentation. |
Latch wait time
The average wait time, in milliseconds, that latch requests have to wait before being executed.
Equivalent PerfMon counter | SQLServer:Latches - Average Latch Wait Time (ms) |
Explanation | Think of a latch as a lightweight lock used by SQL Server to manage many internal operations. They prevent one SQL Server internal operation from conflicting with another internal SQL Server operation. This average is calculated by dividing latch wait milliseconds by the number of latch waits. Monitoring latch wait time helps to generically identify user activity and resource utilization that may be the cause of performance bottlenecks. The current average should be compared against a baseline average. Note that latches that do not have to wait for requests to be granted are not included in this value. |
Guideline values | In general, there is a correlation between this metric and latch waits/sec. Use PerfMon to check the latch waits/sec value. If both values increase above their baseline averages, the SQL Server is experiencing resource contention that needs further investigation. |
Check also | the following features included in the dynamic management view (DMV) to help identify which activity is causing latch wait problems:
Note: These DMVs are not supported in SQL 2000. If you are using SQL 2000, use the DBCC SQLPERF('WAITSTATS') statement. The relative wait numbers and times for every latch class must be reviewed to understand SQL Server instance performance issues. |
Lock timeouts/sec
The number of locks per second that timed out.
Equivalent PerfMon counter | SQLServer:Locks - Lock Timeouts/sec |
Explanation | Locks are held on SQL Server resources to stop them from being used simultaneously by another transaction. For example, if a transaction holds an exclusive lock on a row, no other transaction can make changes to that row until the lock is removed. Fewer locks allow more concurrent transactions to take place and this may improve performance. The lock timeout limit in SQL Server depends on the value set using the SET LOCK_TIMEOUT command. If no value is set, SQL Server will wait indefinitely for the lock to clear. |
Guideline values | Any value greater than 0 suggests that users may be experiencing problems with queries that are not completing. |
Check also | Lock waits/sec. This can identify the number of locks that did not timeout but had to wait for resource. |
Lock waits/sec
The number of locks per second that could not be satisfied immediately and had to wait for resources.
Equivalent PerfMon counter | SQLServer:Locks - Lock Waits/sec |
Explanation | Locks are held on SQL Server resources to stop them from being used simultaneously by another transaction. For example, if a transaction holds an exclusive lock on a row, no other transaction can make changes to that row until the lock is removed. Fewer locks allow more concurrent transactions to take place and this may improve performance. |
Guideline values | Any value greater than 0 suggests that some level of blocking is occurring. It is best practice to compare the current value against a baseline to determine what is normal for your SQL Server. An increasing lock wait time indicates potential resource contention. |
Check also | Buffer cache hit ratio. If this value is above 90%, look at the Avg. disk queue length. Excessive disk queueing may be caused by badly configured or inadequately sized disk subsystems. You may want to review indexing and investigate the numbers of concurrent connections. If the Buffer cache hit ratio is below 90%, additional RAM may be required. |
Avg. lock wait time
The average wait time (in milliseconds) for each lock that could not be satisfied immediately and had to wait for resources.
Equivalent PerfMon counter | SQLServer:Locks - Average Wait Time (ms) |
Explanation | Locks are held on SQL Server resources to stop them from being used simultaneously by another transaction. For example, if a transaction holds an exclusive lock on a row, no other transaction can make changes to that row until the lock is removed. Fewer locks allow more concurrent transactions to take place and this may improve performance. |
Guideline values | Average wait times of 500 ms or more may indicate excessive blocking and should be investigated. In general, Avg. lock wait time correlates with Lock waits/sec, and if you see both of them increasing, you need to investigate what is causing the blocking issues. |
Check also | Lock waits/sec to identify peaks over time, and Lock timeouts/sec to see how many locks reached the threshold set using the SET LOCK_TIMEOUT command. |
Database metrics
Transactions/sec
Number of transactions started for the database per second. Each data point shows the average number of transactions per second since the previous data point.
Equivalent PerfMon counter | SQLServer:Databases - Transactions/sec |
Explanation | A transaction is one or more database operations combined into a single operation; either user-defined transactions surrounded by BEGIN TRAN and COMMIT TRAN or individual data modification statements such as INSERT or UPDATE. Transaction rate is affected by general system performance and resources such as I/O, number of users, cache size, and complexity of requests.This counter records only explicit transactions or transactions that change data. Note: Transactions per second only measures activity inside a transaction, not all activity, The Batch requests/sec metric is a more complete picture of all SQL Server activity. |
Guideline values | A high number of transactions per second is not necessarily an indication of a problem, only that there is a lot of activity on your SQL Server. You should establish a baseline value and track when transactions are unusually high or low. |
Active transactions
Number of active transactions for the database. Each data point shows the number of active transactions at the point the data was collected.
Equivalent PerfMon counter | SQLServer:Databases - Active Transactions |
Explanation | Active transactions are UPDATE transactions on the database that that have not yet been committed or rolled back. Note: This metric measures activity inside a transaction, not all activity. The Batch requests/sec metric is a more complete picture of all SQL Server activity. |
Guideline values | Generally, the number of active transactions should be considerably lower than the number of Transactions/sec(as active transactions show only those transactions that have not yet completed). If the number of transactions currently active is higher than the Transactions/sec metric, then this may indicate you have long-running transactions. Long running transactions can increase lock times and cause blocking. |
Total size
The total size (in GB) of all data files (.mdf and .ndf) and log files (.ldf) for this database.
Equivalent PerfMon counter | SQLServer:Databases - Data File(s) Size (KB) + SQLServer:Databases - Log File(s) Size (KB) |
Explanation | This value is the sum of the total data files and total log files sizes. It is useful for determining the amount of space required by the database and identifying trends. |
Guideline values | For best practice, MDF, NDF, and LDF files should be proactively managed in order to prevent autogrowth from kicking in and growing your database files. Note: If the values for this metric drop to zero intermittently, the most likely cause is a problem affecting sys.master_files from which SQL Monitor collects the data. |
Check also | Log size for details of total .ldf files, and Data size for details of total .mdf and .ndf files. |
Data size
Total size of all data files (.mdf and .ndf files) for the specified database.
Equivalent PerfMon counter | SQLServer:Databases - Data File(s) Size (KB) |
Explanation | This value is the total size of the database, excluding the transaction log. Information about individual primary data files (.mdf) and secondary data files (.ndf) such as their current size, location, and autogrowth settings, is shown under the Filessection of the Database overview page in SQL Server Management Studio. |
Guideline values | Look out for unusual large increases in the size of your data files. Ideally your database should be sized in a way that minimizes autogrowth, as each size increase is expensive in terms of I/O and will also physically fragment your data and log files. Growing a file on demand is an expensive process and will impact performance. Autogrowth should only be used as a safety valve to allow a database to grow should you accidently run out of space. It should not be used to manage your MDF file growth. In general, data files should be pre-sized when they are first created to meet future expected growth, help avoid file fragmentation and ensure better database performance. Note: If the values for this metric drop to zero intermittently, the most likely cause is a problem affecting sys.master_files from which SQL Monitor collects the data. |
Check also |
Caution: Shrinking files to reduce space causes fragmentation and is not recommended. |
Log size
Total size of all log files (.ldf) for this database.
Equivalent PerfMon counter | SQLServer:Databases - Log File(s) Size (KB) |
Explanation | Each transaction performed in the database is recorded in the transaction log. This allows recovery in the event of data loss since the last backup. Information about the log file (.ldf) such as its location, autogrowth setting and current size, is shown under the Files section of the Database overview page in SQL Server Management Studio. |
Guideline values | Transaction logs are serial records of database modifications. They're used during recovery operations to commit completed transactions and to roll back uncompleted transactions. The size of the log file is determined by the Recovery Model set for the database. By default when creating a new database it will use the Full recovery mode (inherited from the Model database), where transactions in the log file are only ever removed when a transaction log backup is performed. When you back up a transaction log, the backup stores the changes that have occurred since the last transaction log backup and then truncates the log, which clears out transactions that have been committed or aborted. If the transaction log is not backed up regularly, it will grow indefinitely in size and could fill your entire disk. Note: If the values for this metric drop to zero intermittently, the most likely cause is a problem affecting sys.master_files from which SQL Monitor collects the data. |
Check also |
|
Log space used
Percentage of total transaction log file size currently containing log records.
Equivalent PerfMon counter | SQLServer:Databases - Log File(s) Used Size (KB) |
Explanation | The transaction log file stores the details of all the modifications that you perform on your SQL Server database. The amount of space it takes up depends on the recovery model used, database activity, how often transaction logs are backed up, among other factors. |
Guideline values | It is normal to see the log space figure to vary considerably in size over time as transactions are recorded, and removed when log backups are performed. |
Log bytes flushed/sec
Shown on the Availability group overview as Log growth.
Average size of the log flush per second. Each data point shows the average log bytes flushed per second for this database since the previous data point.
Equivalent PerfMon counter | SQLServer:Databases - Log Bytes Flushed/sec |
Explanation | This metric helps to assess utilization and identify trends of the transaction log. A log flush occurs when data is written from the log cache to the physical transaction log file on disk, every time a transaction is committed. Within availability groups, both the primary and the secondary replicas have a log bytes flushed rate. On the primary replica, this shows how quickly log records are being added to the log send queue. On the secondary, it shows how quickly log bytes are being added to the redo queue. The Availability group overview page only shows the rate of log bytes flushed on the primary replica. |
Guideline values | The size of a log flush varies depending on the nature of the transaction. For example, a single INSERT into 1,000,000 rows will result in a single log flush, but a high number of bytes. Within availability groups, if the rate of log bytes flushed on the primary replica is higher than the rate at which log bytes are sent, the size of the log send queue will increase. To estimate the potential data loss on failover (in asynchronous-commit mode), divide the log send queue by the log bytes flushed. If the rate of log bytes flushed on the secondary replica is higher than the redo rate, the size of the redo queue will increase. |
Check also |
|
Log flushes/sec
Number of log flushes per second for this database, averaged since last collection time.
Equivalent PerfMon counter | SQLServer:Databases - Log Flushes/sec |
Explanation | A log flush occurs when a transaction is committed and data is written from the log cache to the physical transaction log file. The log cache is a temporary location in memory where SQL Server stores data to be written to the log file, and is used to roll back a transaction before it is committed if required. Once a transaction is completed (and can no longer be rolled back), the log cache is immediately flushed to the physical log file on disk. |
Guideline values | Log flushes per second should generally correlate with the number of transactions per second. If log flushes per second seems to be significantly higher than the expected number of transactions, check your use of explicit transactions in queries. Explicitly defining the start and end of transactions (rather than implicit transactions, where SQL Server needs to flush the log for each data statement) should reduce the number of log flushes, and reduce impact on I/O. |
Check also | Log bytes flushed/sec |
Log flush waits/sec
Number of transaction commits per second waiting to be flushed from the log cache to the physical transaction log file on disk.
Equivalent PerfMon counter | SQLServer:Databases - Log Flush Waits/secg Bytes Flushed/sec |
Explanation | A high number of log flush waits indicates that it is taking longer than normal to flush the transaction log cache to the file on disk, which will slow SQL Server's performance. |
Guideline values | Log flush waits should be as close to 0 as possible. |
Check also | Disk avg. write time. If this is greater than 5 ms, then an I/O bottleneck is likely to be occurring. |
Possible solutions | Moving the location of your transaction log (LDF file) to a separate physical drive from your data files (MDF files) will increase performance, as writing to the log and data files will not be competing for access to the same disk. Also consider using faster disk drives, using RAID 10 for transaction log file storage, adding additional spindles to your existing RAID array, short-stroking, or adding additional disk controllers. |
Availability group metrics
Log bytes received/sec
Shown on the Availability group overview as Received rate.
The rate at which the the secondary replica receives log records from the primary replica.
Equivalent PerfMon counter | SQLServer:Database Replica - Log Bytes Received/sec |
Explanation | This value measures how quickly log records are received by the secondary replica. |
Guideline values | To estimate how long a secondary replica will take to catch up with the primary replica, divide the log send queue by the rate of log bytes received. |
Check also |
|
Log send queue
Shown on the Availability group overview as Send queue.
Amount of log records in the primary database log file waiting to be sent to the secondary replica.
Equivalent PerfMon counter | SQLServer:Database Replica - Log Send Queue |
Explanation | This value is the amount of log data that has been flushed to disk on the primary replica but hasn't yet been sent to the secondary replica. |
Guideline values | The rate of log bytes flushed (on the primary replica) adds to the log send queue. If the rate of log bytes flushed on the primary is higher than the rate at which log records are sent, the log send queue will increase. To estimate the potential data loss on failover (in asynchronous-commit mode), divide the log send queue by the log bytes flushed. From the log send queue, the log records are sent to the secondary replica. To estimate how long it will take the secondary replica to catch up with the primary replica, divide the log send queue by the log bytes received/sec. If you use the secondary replica for reporting and want to know how much data from the primary replica is not yet represented in the secondary replica, add together the log send queue and the redo queue. |
Check also | Log bytes flushed/sec: Average size of the log flush per second. |
Redo queue
The amount of log records that needs to be redone (written to the secondary database) for the synchronous-commit primary and secondary replicas to be synchronized.
Equivalent PerfMon counter | SQLServer:Database Replica - Recovery Queue |
Explanation | After the log data has been hardened into the secondary replica's log file, it joins the redo queue. |
Guideline values | The rate of log bytes received adds to the size of the redo queue, while the redo rate takes away from it. If the rate of log bytes received is higher than the redo rate, the size of the redo queue will increase. In event of a failover, the redo queue needs to be cleared before the secondary replica can transition to the role of primary replica. To estimate the time, in seconds, that it will take the secondary replica to redo the log (and therefore how long failover will take), divide the redo queue by the redo rate. If you use the secondary replica for reporting and want to know how much data from the primary replica is not yet represented in the secondary replica, add together the log send queue and the redo queue. |
Check also |
|
Redo rate
The rate at which log records are redone (written to the secondary database).
Equivalent PerfMon counter | SQLServer:Database Replica - Redone Bytes/sec |
Explanation | Log records from the redo queue are written to the secondary database, completing the synchronization process. If the redo rate is lower than the rate of log bytes flushed on the secondary replica, the size of the redo queue will increase. |
Guideline values | To estimate the time, in seconds, that it will take the secondary replica to redo the log, divide the redo queue by the redo rate. |
Check also |
|
Transaction delay ms/sec
This value measures the extent to which transactions are slowed down by synchronous replication.
Equivalent PerfMon counter | SQLServer:Database Replica - Transaction Delay |
Explanation | In synchronous-commit mode, the primary replica doesn't commit any transactions until it receives acknowledgement that all synchronous secondary replicas have finished hardening the log, resulting in transaction delay. This is the total delay across all transactions caused by the primary replica waiting for commit acknowledgement from the secondary replicas. As such, it's useful for working out whether activity on the secondary replicas is slowing down the primary replica. |
Guideline values | The flow control time contributes to the transaction delay on synchronous-commit replicas, because it affects the rate at which log records are sent. The rate of log bytes flushed on the secondary replica also affects the transaction delay. |
Check also |
|
Flow control time ms/sec
The total time that log records waited per second due to flow control.
Equivalent PerfMon counter | SQLServer:Availability Replica - Flow Control Time (ms/sec) |
Explanation | Flow control throttles the flow of network traffic caused by replication, by limiting the speed at which the primary replica sends log records to the secondary replica. This helps to even out the usage of network and memory resources. This value shows the total time, in milliseconds per second, that the log records had to wait before being sent to the secondary replica, because of flow control. |
Guideline values | Flow control can restrict the rate at which log records are sent from the primary replica. This will affect the log send queue: if the rate at which log records are sent is lower than the rate of log bytes flushed (on the primary replica), the log send queue will increase. The flow control time also contributes to the transaction delay on synchronous replicas, because it affects the rate at which log data is transferred between the primary and the secondary replicas. |
Check also |
|