SQL Monitor 9

List of metrics

Machine metrics

Machine: processor time

Total processor utilization (averaged across all processor cores) on the physical host Windows machine.

Equivalent PerfMon counterProcessor: % Processor Time (_Total)
ExplanationThis is a general indicator of how busy all of your processors are. Each data point represents the average non-idle time for all processor cores since the previous data point; average utilization for each processor is totaled and then divided by the number of processors on the machine. Machine: processor time is collected and averaged every 15 seconds.
Guideline values

Ideally, Machine: processor time should average less than 50% for best SQL Server performance. If Machine: processor time exceeds an average 80% for a sustained time (five minutes or more), then a CPU bottleneck exists during this time, and you should investigate its cause. Keep in mind that short spikes in Machine: processor time above 80% are normal and expected, and only sustained high CPU utilization is a problem. If a spike indicates a plateau, not a sharp peak, this may indicate a slow running query is causing the CPU problem.

Note: When running in a virtual environment, the Machine: processor time is not relevant because it is unable to accurately measure the CPU activity within a virtual machine. Instead, use the Hyper-V Logical Processor performance counters.

Check also
  • SQL Server processor time: In most dedicated SQL Server machines, there is a high correlation between Machine: processor time and SQL Server: processor time. If the Machine: processor time is much higher than the SQL Server: processor time, then some process, other than SQL Server is causing the high CPU utilization, and this process should be investigated.
  • Avg. CPU queue length: This measures the number of threads in the processor queue. There is a single queue for processor time even on computers with multiple cores. If a computer has multiple processor cores, SQL Monitor divides this value by the number of processor cores servicing the workload. A sustained processor queue of less than 10 threads per processor is normally acceptable, dependent of the workload. A number exceeding 10 threads per processor indicates a potential CPU bottleneck.
Possible solutions

Assuming the Machine: processor time is due to the SQL Server process, reducing CPU utilization can be accomplished many different ways, including rewriting poorly written queries, reducing query plan recompilation, optimizing query parallelism, and avoiding using client-side cursors. If correcting the above does not resolve CPU bottlenecks, consider upgrading to faster processors or increasing the number of processors.

If the Machine: processor time is high, and the SQL Server: processor time is not highly correlated, then identify the non-SQL Server process causing the excessive CPU activity and correct it.

More information

Running SQL Server 2008 in a Hyper-V Environment

Performance Tuning Guidelines for Windows Server 2008

Troubleshooting Performance Problems in SQL Server 2008

Avg. CPU queue length

Measures the number of threads in the processor queue waiting to be executed. The queue contains threads that are ready to run but cannot because all of the processor cores are busy.

Equivalent PerfMon counterSystem: Processor Queue Length (divided by the number of processor cores)
ExplanationThere is a single process queue, even on computers with multiple cores. Therefore, if a computer has multiple processor cores, SQL Monitor automatically divides the processor queue length by the number of processor cores servicing the workload to determine the average processor queue length.
Guideline values

The lower the number, the better, but an average processor queue length of less than 10 is normally acceptable, dependent on the workload. A number exceeding 10 threads per processor indicates a CPU bottleneck. Occasional spikes in the average processor queue length over 10 are normal, but only sustained spikes in the average processor queue length are a problem.

Check also

Machine: processor time for the host machine. If the average process queue length is regularly greater than 10, andMachine: processor time regularly exceeds 80%, then your machine has a CPU bottleneck.

Possible solutions

If a CPU bottleneck is confirmed, reducing CPU utilization can be accomplished in many different ways, including rewriting poorly written queries, reducing query plan recompilation, optimizing query parallelism, and avoiding using client-side cursors. If correcting the above does not resolve CPU bottlenecks, consider upgrading to faster processors or increasing the number of processors. If the Machine: processor time is high, and the SQL Server: processor time is not highly correlated, then identify the non-SQL Server process causing the excessive CPU activity and correct it.

More information

Troubleshooting Performance Problems in SQL Server 2008

Machine: memory used

Displays the amount of physical memory (in bytes) used on the server.

Equivalent PerfMon counter(WMI: Win32_ComputerSystem.TotalPhysicalMemory) - (Memory: Available Bytes)
ExplanationThis is the total number of bytes of memory currently used by all processes. If applications on the server are running slow, and data access is also slow, you may want to check the values for this metric.
Guideline values

High values may suggest a computer memory shortage or indicate that the computer or an application is not releasing memory. The optimal value depends on how much physical memory the machine has. For example, on a 6 GB machine, you may allow 2 GB for your operating system and assign the remaining RAM to your SQL Server.

Check also

Memory pages/sec. This indicates whether memory shortages may be caused by excessive paging.

Memory pages/sec

Measures the rate at which pages are read from or written to disk to resolve page faults.

Equivalent PerfMon counterMemory: Pages/sec
ExplanationThis is the sum of the memory pages input/sec and memory pages output/sec. It highlights the number of pages that were retrieved from disk because of hard page faults, or were written to disk to free space in the working set because of page faults.
Guideline values

Extreme variations in Memory pages/sec are very common, especially when operations such as backups, restores, imports, or exports are performed. If Memory pages/sec exceeds 1000, and if the number of bytes available is less than 100 MB on a consistent basis, this is a strong indication of memory pressure.

Check also

Machine: memory used and PerfMon counter Process: working set to calculate the amount of physical RAM available to SQL Server that is not currently being used by other processes. If the number of bytes available is less than 100 MB, andMemory pages/sec are high, the server is under significant memory pressure.

Possible solutionsRemove unnecessary applications and services from your server, add more physical RAM.

Network utilization

The percentage of total bandwidth usage to help identify whether your network adapter is at saturation point.

Equivalent PerfMon counter8 * ((Network Interface: Bytes Received/sec) + (Network Interface: Bytes Sent/sec)) / (Network Interface: Current Bandwidth) *100
ExplanationIndicates the total amount of bandwidth being used on the connection to which the network card is attached. It is useful to determine when the network is idle, normal or busy. Note: This does not measure the amount of bandwidth being sent to and from the server. The value is calculated by adding together the rates at which bytes are sent and received per second over the network adapter and then converting them to bits. This value is then compared with the current bandwidth on the network adapter to calculate Network utilization.
Guideline values

This value is most useful when there is a single SQL Server instance on a host machine with no other network traffic. The ideal percentage value depends on the type of network connection you are using, so a baseline should be set against which efficiency can be measured. On average, more than 50% network utilization is high and more than 80% network utilization is very high. For optimum performance, you should have a network connection to a dedicated switch port running at full duplex.

Check also

During busy periods, further investigation into the network-related processes running at the time may reveal which ones are placing significant load on network bandwidth. To do this, open Windows Task Manager and select Resource Monitor from the Performance tab.

Possible solutionsTurn off power savings mode on network cards, manually set the speed and duplex of your network card instead of having it set automatically for you, use a faster network card, add additional network cards.

Disk used

The amount of space used for each disk (in GB).

Equivalent PerfMon counter((denominator of LogicalDisk: % Free Space) - (LogicalDisk: Free Megabytes))
ExplanationThis information allows you to monitor how much storage space is being used and how much is available on each disk. It is important to evaluate the adequacy of your system's disk space to avoid programs failing because the system is unable to allocate space, and to prevent constraints on the growth of page filing to support virtual memory.
Guideline values

The amount of disk capacity you need to ensure your system disks are adequately supported can be calculated as follows:

  1. Allow 1 GB for an operating system.
  2. Add the size of all applications.
  3. Add at least twice the size of system memory for the paging file.
  4. Add the disk space estimate for each user multiplied by the number of users.
  5. Multiply by at least 130% to take into account room for expansion.

As a general rule, if the disk space used is 80% of capacity or greater, you may consider running a Disk Cleanup tool, compressing or defragmenting the disk, transferring certain files to other disks or using remote storage.

Disk avg. read time

The average time in milliseconds of a read operation from the logical disk.

Equivalent PerfMon counterLogical Disk: Avg. Disk sec/Read
ExplanationMeasures logical disk latency (how fast a disk responds) when data is read from it. The lower the latency, the faster the I/O performance, and vice versa. This metric, and the Disk avg. write time metric, are two of the most important indicators of I/O bottlenecks.
Guideline values

For drives with MDF and NDF files and OLTP loads, average read latency should ideally be below 20 ms. For drives with OLAP loads, then latency up to 30 ms is considered acceptable. For drives with LDF files, latency should ideally be 5 ms or less. In general, anything above 50 ms is slow and suggests a potentially serious I/O bottleneck. Write latency will vary considerably over time, so these guideline figures ignore occasional spikes.

Check also

Disk avg. write time. The values should correspond with the guidelines described above.

Possible solutionsTune queries to reduce the resources used to execute them, use faster disk drives, use RAID 10, use more spindles inside RAID arrays, use faster controllers, use multiple controllers, use short-stroking, avoid using SANs where your data files are shared on arrays with other applications. MDF and NDF files should always be isolated from LDF files. Ideally, each LDF file should be on its own array for maximum performance.

Disk avg. write time

The average time, in milliseconds, of a write of data from the logical disk.

Equivalent PerfMon counterLogical Disk: Avg. Disk sec/Write
ExplanationMeasures logical disk latency (how fast a disk responds) when data is written to it. The lower the latency, the faster the I/O performance, and vice versa. This metric, and the Disk avg. read/sec metric, are two of the most important indicators of I/O bottlenecks.
Guideline values

For drives with MDF and NDF files and OLTP loads, average write latency should ideally be below 20 ms. For drives with OLAP loads, then latency up to 30 ms is considered acceptable. For drives with LDF files, latency should ideally be 5 ms or less. In general, anything above 50 ms is slow and suggests a potentially serious I/O bottleneck. Write latency will vary considerably over time, so these guideline figures ignore occasional spikes.

Check also

Disk avg. read time. The values should correspond with the guidelines described above.

Possible solutionsTune queries to reduce the resources used to execute them, use faster disk drives, use RAID 10, use more spindles inside RAID arrays, use faster controllers, use multiple controllers, use short-stroking, avoid using SANs where your data files are shared on arrays with other applications. MDF and NDF files should always be isolated from LDF files. Ideally, each LDF file should be on its own array for maximum performance.

Disk transfers/sec

Measures how many read and write requests are being completed per second on the logical disk.

Equivalent PerfMon counterLogical Disk: Disk Transfers/sec
ExplanationThis metric helps to determine the number of IOPS (IOs per second) occurring on the logical disk, and whether your SQL Server applications are creating more I/O that can be handled by the disk subsystem.
Guideline values

The Disk Transfers/sec should not exceed the IOPS capacity of your disk subsystem, otherwise there will be an I/O bottleneck.

Check also

Disk avg. read time and Disk avg. write time. If these metrics are not within their recommended thresholds, then most likely your disk subsystem does not have the IOPS capacity it needs to keep up with the current application workload.

Possible solutionsThe first thing you should do is determine the IOPS capacity of your disk system. Once you know what it is, then you will be in a position to know if you need to do anything to help boost the IOPS capacity of your disk subsystem, such as tune queries to reduce the resources used to execute them, use faster disk drives, use RAID 10, use more spindles inside RAID arrays, use faster controllers, use multiple controllers, use short-stroking, avoid using SANs where your data files are shared on arrays with other applications. MDF and NDF files should always be isolated from LDF files. Ideally, each LDF file should be on its own array for maximum performance.
More informationAnalyzing Storage Performance

Avg. disk queue length

Measures the pressure on the physical hard disk array.

Equivalent PerfMon counterLogical Disk: Avg. Disk Queue Length
ExplanationThis is the average number of physical read and write requests queued for a physical disk.
Guideline values

Due to changes in technology, such as virtualization, disk and controller technology, SANs and more, this counter is no longer a good indicator of I/O bottlenecks. A better measure of I/O bottlenecks is Avg. disk read time and Avg. disk write time.

Check also

Disk avg. read time and Disk avg. write time.

SQL Server metrics

User connections

The total number of user connections to a SQL Server instance at a given time.

Equivalent PerfMon counterSQLServer: General Statistics: User Connections
Explanation

This value provides a general guideline of how busy the SQL Server is. Every connection to a SQL Server instance requires on average 28 KB of memory overhead, even if the user is not currently using the connection. You can adjust the settings for your SQL Server instance so that the maximum number of user connections permitted is determined automatically by identifying the number of users currently logged on.

Generally speaking, there is a correlation between the number of user connections and Batch requests/sec.

Note: The user connections value is not the same as the number of users. A single user may have multiple connections open, or multiple users may share a single connection.

Guideline values

It is useful to monitor this value over time to indicate increases or decreases in usage of your SQL Server. The number of worker threads is calculated dynamically based on the type of installation and number of processors available to SQL Server:

 

  • For 32-bit servers, the base worker thread count is 256 and 8 additional worker threads are added for each scheduler above the initial 4 on the server ((NumberOfSchedulers -4) *8) +256.
  • For 64-bit servers, the base worker thread count is 512 and 16 additional worker threads are added for each scheduler above the initial 4 on the server ((NumberOfSchedulers -4) *16) +512.

For more information on worker thread counts, see Configure the max worker threads Server Configuration Option.

 Note: The number of connections on a server can easily be three or four times the number of worker threads if connection pooling is being used. Connections are pooled per application domain, per connection string, per server, and not every connection will be executing at exactly the same time.

SQL Server: processor time

The processor utilization (%) for the SQL Server service running on the host Windows machine.

Equivalent PerfMon counterProcess(sqlservr): % Processor Time
Explanation

This value tells you what percentage of processor time is being used to run the sqlservr process (the database engine). It excludes other SQL Server-related processes, such as the SQL Server Agent, SSIS, Reporting Services, Analysis Services, and Full Text Search. Because SQL Server: processor time is a subset of Machine: processor time, the SQL Server: processor time value will be lower.

Note: On multiprocessor computers, it is common for SQL Server to use more than the equivalent of 100% processor time on one processor. This is because its threads may be using multiple processors.

Guideline values

Ideally, SQL Server: processor time should average less than 50% for best SQL Server performance. If it exceeds an average 80% for a sustained time (five minutes or more), then a CPU bottleneck exists during this time, and you should investigate its cause. Short spikes in SQL Server: processor time above 80% are normal and expected, and only sustained high CPU utilization is a problem. If a spike indicates a plateau, not a sharp peak, this may indicate a slow running query is causing the CPU problem.

Generally, SQL Server: processor time should be the busiest process on a host system, and will correlate closely with Machine: processor time. If SQL Server: processor time is substantially smaller than Machine: processor time, this may indicate that some other process running on the host system is using CPU, potentially using CPU resources that could be used by the Database Engine in order to enhance its performance. If this is the case, identify which processes are using Task Manager in order to diagnose what is happening on your host system.

Check also

Machine: processer time where the average utilization for each processor is totaled and then divided by the number of processors on the machine. If this average is 80% or more, or if you consistently record averages greater than 50%, your processors may need upgrading. Also, Avg. CPU queue length provides a stronger indication of whether your processor is running slowly.

SQL Server: total memory

The total amount of dynamic memory the server is currently using.

Equivalent PerfMon counterSQLServer:Memory Manager - Total Server Memory (KB)
ExplanationAmount of physical memory the mssqlserver service is currently using; includes the total of the buffers committed to the SQL Server BPool and "OS in Use" type buffers.
Guideline values

If SQL Server: total memory is relatively high compared to the amount of physical memory in the computer, it may indicate that more physical memory is required.

Check also
  • SQL Server: target memory: Displays the maximum amount of dynamic memory (the buffer pool) that SQL Server is prepared to use. This number cannot exceed the maximum server memory setting. When SQL Server first starts, this value corresponds to the buffers reserved at SQL Server startup. The default memory management behavior of the Microsoft SQL Server Database Engine is to acquire as much memory as it needs without creating a memory shortage on the system. If SQL Server needs more physical RAM after startup, and if there are Mbytes available, then SQL Server will grab it and SQL Server: target memory will increase. If the OS requests RAM back from SQL Server, then SQL Server: target memory will usually decrease, which in some cases can cause SQL Server memory pressure, hurting SQL Server performance.
  • Buffer cache hit ratio: The percentage of pages that were found in the buffer cache without having to read from disk. Ideally, this value should be close to 99% for OLTP system. A smaller number may indicate that the SQL Server is under memory pressure. OLAP systems often have a Buffer Cache Hit Ratio of less than 90%, which is typical.
  • Buffer free pages: The number of free spaces available in the buffer cache. Generally speaking, if this number drops below 640 pages, this could indicate SQL Server memory pressure.
  • Buffer page life expectancy: Measures the average number of seconds a page will stay in the buffer cache without being accessed. Generally speaking, if his value drops below 300 seconds, this may indicate SQL Server is under memory pressure.
Possible solutionsRemove unnecessary applications and services from your server, and add more physical RAM.

SQL Server: target memory

Maximum amount of dynamic memory (the buffer pool) that SQL Server is prepared to use.

Equivalent PerfMon counterSQLServer:Memory Manager - Target Server Memory (KB)
ExplanationDisplays the maximum amount of dynamic memory (the buffer pool) that SQL Server is prepared to use. This value cannot exceed the max server memory setting. When SQL Server first starts, this value corresponds to the buffers reserved at SQL Server startup. The default memory management behavior of the Microsoft SQL Server Database Engine is to acquire as much memory as it needs without creating a memory shortage on the system. If SQL Server needs more physical RAM after startup, and if Mbytes are available, then SQL Server will grab it and SQL Server: target memory will increase. If the OS requests RAM back from SQL Server, then SQL Server: target memory will usually decrease, which in some cases can cause SQL Server memory pressure, hurting SQL Server's performance.
Guideline values

If this value is high (relative to the total physical memory available on your machine), then you should consider adding extra physical memory, to improve the performance of SQL Server.

Check also
  • SQL Server: total memory: Displays the total amount of dynamic memory (the buffer pool) the server is currently using. SQL Server: total memory will increase as SQL Server needs it, assuming it is available. This value will always be smaller or equal to SQL Server: target memory. If SQL Server: total memory is relatively high compared to the amount of physical memory in the computer, it may indicate that more physical memory is required.
  • Buffer cache hit ratio: The percentage of pages that were found in the buffer cache without having to read from disk. Ideally, this value should be close to 99% for OLTP system. A smaller number may indicate that the SQL Server is under memory pressure. OLAP systems often have a Buffer cache hit ratio of less than 90%, which is typical.
  • Buffer free pages: The number of free spaces available in the buffer cache. If this number drops below 640 pages, this could indicate SQL Server is under memory pressure.
  • Buffer page life expectancy: Measures the average number of seconds a page will stay in the buffer cache without being accessed. If his value drops below 300 seconds, this may indicate SQL Server is under memory pressure.
Possible solutionsRemove unnecessary applications and services from your server, and add more physical RAM.

SQL Server: free memory

The total amount of dynamic memory that the server is not using.

Note: This metric was introduced in SQL Server 2012 to replace Buffer free pages (which was used in SQL Server 2008 and earlier). For SQL Server 2008 and earlier, we've now converted free pages into memory (1 page is equal to 8 KB).

Equivalent PerfMon counterSQLServer: Memory Manager - Free memory
ExplanationThe Memory Manager monitors the overall memory usage of SQL Server. This metric can help you identify whether there are bottlenecks resulting from a lack of memory (which means that SQL Server has to retrieve data from disk). It can also help you to identify whether you can improve query performance by increasing the amount of memory.
Guideline values

free memory value greater than 5 MB should ensure that the buffer cache can handle sufficient new memory allocation requests. A value less than 5 MB may mean that requests are stalled and this can affect performance. You may consider adding memory or making more memory available to the data cache.

Check also

Buffer cache hit ratio: If this value is below 90% and your free memory value is less than 5 MB, you may be experiencing memory pressure and additional RAM may be required.

Batch requests/sec

Number of T-SQL batch requests received by SQL Server per second, averaged since last collection time.

Equivalent PerfMon counterSQLServer:SQL Statistics - Batch Requests/sec
Explanation

This gives a general indicator of how busy SQL Server is. It measures all of the batches you send to the SQL Server. The number of batch requests SQL Server can handle is dependent on the capability of your hardware. It may be useful to track this value over time, to gauge the scalability of your system, and to identify when you have peaks and troughs in user requests.

Batch requests/sec only tracks the number of batches, not the statements in the batch. Some batches may use very little resources, while other batches use huge amounts of resources. Each batch is different, and this metric only provides an absolute count of the batches executed by SQL Server.

Guideline values

In general, over 1000 batch requests per second indicates a very busy SQL Server, and could highlight the potential for a CPU bottleneck. More powerful hardware can of course handle higher numbers of requests.

Check also

Compilations/batch and Compilations/sec. These values should be as low as possible, as high values may indicate a lot of ad-hoc queries being run. You may need to optimize T-SQL.

Compilations/sec

SQL compilations per second; each data point shows the average value since the previous data point.

Equivalent PerfMon counterSQLServer:SQL Statistics - SQL Compilations/sec
Explanation

The number of times that Transact-SQL compilations occur, per second (including recompiles). The lower the value the better. Each compilation uses CPU and other resources, so SQL Server will attempt to reuse cached execution plans wherever possible. Not having to recompile the stored procedure reduces the overhead on the server and speeds up overall performance.

Guideline values

In general, Compilations/sec should be less than 10% of the Batch requests/sec. High values often indicate excessive ad-hoc querying and should be as low as possible.

Check also

Batch requests/sec.

Possible solutionsRewrite ad-hoc queries as stored procedures. Reduce usage of temp tables, or combine statements to eliminate them in stored procedures. Check "SET" statements - changes in session settings such as ANSI_PADDING or ANSI_NULLS can cause a query to be recompiled. If SQL Compilations/sec are excessive in SQL Server 2005 or later, consider using "Forced parameterization" on the database. This tells SQL Server to force the parameterization of virtually all SELECT, INSERT, UPDATE and DELETE statements, requiring fewer compilations of statements due to re-use of more query plans.

Compilations/batch

SQL compilations per batch requests received by SQL Server.

Equivalent PerfMon counter(SQLServer:SQL Statistics - SQL Compilations/sec) / (SQLServer:SQL Statistics - Batch Requests/sec)
Explanation

This should provide a general indication of how quickly and efficiently SQL Server query optimizer is processing user queries.

Guideline values

As a general rule, Compilations/sec should be less than 10% of the Batch requests/sec. High values often indicate excessive adhoc querying and should be as low as possible.

Check also

Compilations/sec and Batch requests/sec.

Recompilations/sec

Number of times, per second, that Transact-SQL objects attempted to be executed had to be recompiled.

Equivalent PerfMon counterSQLServer:SQL Statistics - Re-Compilations/sec
Explanation

Usually, once an object is compiled, it does not have to be recompiled. When a stored procedure or query is recompiled, a compile lock is placed on the objects referenced by the procedure, and database blocking may occur. Excessive recompilations can cause serious performance problems or may compile locks that are incompatible with any known locking type.

Guideline values

Recompilations/sec should be as small as possible. Any value greater than 10% of Compilations/sec can indicate a problem.

Check also

Compilations/secBatch requests/sec, and Compilations/batch requests to determine how quickly and efficiently SQL Server and the query optimizer are processing user queries. Check also SP:Recompile and SQL:StmtRecompile event classes in a Profiler trace to identify which stored procedures and SQL statements need recompiling with every run. The EventSubClass data column tells you what caused the recompile.

Buffer cache hit ratio

The percentage of pages that were found in the buffer pool without having to read from disk.

Equivalent PerfMon counterSQLServer: Buffer Manager - Buffer cache hit ratio
Explanation

Indicates how frequently a query gets information from cache instead of accessing the hard disk. The value is calculated by dividing the total number of cache hits by the total number of cache lookups. This figure is an average value calculated since the SQL Server service was last restarted, not a snapshot of what is currently happening on your server now. Because of this, be careful how you interpret this value.

Guideline values

If your server is running online transaction processing (OLTP) applications, a value of 99% or higher is ideal, but anything above 90% is generally considered satisfactory. A value of 90% or lower may indicate increased I/O access and slower performance. Adding more physical RAM may help alleviate this problem.

If your server is running online analytical processing (OLAP) applications, it is not uncharacteristic for this value to fall below 90%, which is not normally a problem, unless you see other signs of memory pressure.

Note: Be aware that the value of this metric immediately after a service restart will be uncharacteristically low until enough time has passed for the data cache to fill.

Check also

Buffer page life expectancy: Measures the average number of seconds a page will stay in the buffer cache without being accessed. Generally, if his value drops below 300 seconds, this may indicate SQL Server is under memory pressure.

Buffer page life expectancy

Indicates the average number of seconds a page will stay in the buffer pool without being referenced.

Equivalent PerfMon counterSQLServer: Buffer Manager - Page life expectancy
Explanation

Data is cached in the buffer pool so that it can be retrieved much faster than having to retrieve it from disk. However, once data pages have been stored in the buffer pool, they may or may not be reused. If a page is not referenced after a given amount of time, that page will be removed from the buffer pool so other data can use the space. A high buffer page life expectancy indicates that buffer pages are being reused often, which in turn indicates that the SQL Server is not under any memory pressure.

Guideline values

This value depends on the amount of RAM installed on the servers. It is best practice to record a baseline measurement against which you can compare the monitored values over time. Do this by running the formula (DataCacheSizeInGB/4GB*300) as recommended in Finding what queries in the plan cache use a specific index. For details on how to find the data cache size, see Performance issues from wasted buffer pool memory.

If data pages consistently remain in the buffer for less than your baseline value, this may indicate that there is insufficient memory on the system or not enough memory is configured for SQL Server use. You may consider improving performance by adding RAM. The problem may also be caused by missing indexes, poor index design or badly written application code, such as too many stored procedures being recompiled, or queries reading large tables in the buffer pool when only a small number of rows need changing. Addressing application-specific problems may improve performance.

Check also

Buffer cache hit ratio. If this value is consistently low or drops suddenly, memory bottleneck may be the most likely cause.

Full scans/sec

Measures the number of base table or full index scans occuring per second.

Equivalent PerfMon counterSQLServer:Access Methods: Full Scans/sec
Explanation

This value monitors how many full scans are performed on tables or indexes. While index scans are a normal part of how SQL Server works, a lot of full scans may indicate missing indexes, accessing many small tables, or queries that return large quantities of rows.

Guideline values

It is best practice to record a baseline measurement against which you can compare your monitored values. If the monitored value is significantly higher when compared with the baseline, and CPU is also high, it may suggest performance bottleneck. The ratio of Index searches/sec to Full scans/sec should generally be no more than 1000:1.

Check also

Machine: processor time to establish whether CPU is also high. Also look at Batch requests/sec against your baseline measurement for details of how well the database is coping with queries. You can then check the Lock waits/sec and Lock timeouts/sec to determine whether locking issues are affecting performance.

Page splits/sec

The number of page splits, and new page allocations, in a SQL Server table per second.

Equivalent PerfMon counterSQLServer:Access Methods: Page Splits/sec
Explanation

As new data is inserted or added to an existing data page, and the page does not have enough room for all the new data to fit, SQL Server splits the page, leaving half of the data on the old page and putting the other half on a new page. This is a normal SQL Server event, but excessive page splitting can result in index fragmentation and a variety of related performance problems.

Guideline values

It is best practice to record a baseline against which you can compare your monitored values. In general terms, this value should not be greater than 20% of the number of Batch requests/sec. A low value is preferable, but what constitutes low depends on a number of variable factors including the number of users, the level of user activity and your I/O subsystem performance capabilities.

Check also

Disk avg. read time and Disk avg. write time. There is often a correlation between excessive I/O activity and excessive page splitting.

Possible solutionsResolving page splitting problems is often difficult as the original database design may be contributing to the problem. Besides reconsidering the database design, also consider reevaluating your indexing strategy, and perhaps increase the fillfactor of the indexes being affected by excessive page splitting. If excessive page splitting cannot be effectively eliminated, consider increasing the frequency in which you rebuild or reorganize your indexes in order to remove the fragmentation created by page splitting.

Page splits/batch request

The number of page splits per second in a SQL Server table divided by the number of T-SQL command batch requests received per second.

Equivalent PerfMon counter(SQLServer:Access Methods: Page Splits/sec) / (SQLServer:SQL Statistics - Batch Requests/sec)
Explanation

While pages splits are a normal SQL Server function, excessive page splitting can negatively affect SQL Server performance.

Guideline values

On average, the number of page splits per batch request should be below 20% of the number of batch requests per second, but ideally as close to 0 as possible.

Check also

The fill factor of the tables in your indexes. The lower the fill factor, the more space, which helps reduce potential page splits. For more information about fill factor, see the MSDN documentation.

Latch wait time

The average wait time, in milliseconds, that latch requests have to wait before being executed.

Equivalent PerfMon counterSQLServer:Latches - Average Latch Wait Time (ms)
Explanation

Think of a latch as a lightweight lock used by SQL Server to manage many internal operations. They prevent one SQL Server internal operation from conflicting with another internal SQL Server operation. This average is calculated by dividing latch wait milliseconds by the number of latch waits.

Monitoring latch wait time helps to generically identify user activity and resource utilization that may be the cause of performance bottlenecks. The current average should be compared against a baseline average. Note that latches that do not have to wait for requests to be granted are not included in this value.

Guideline values

In general, there is a correlation between this metric and latch waits/sec. Use PerfMon to check the latch waits/sec value. If both values increase above their baseline averages, the SQL Server is experiencing resource contention that needs further investigation.

Check also

the following features included in the dynamic management view (DMV) to help identify which activity is causing latch wait problems:

 

  • sys.dm_os_latch_stats (latch wait information by class)
  • sys.dm_os_wait_stats (details about the waits encountered by threads in execution)
  • sys.dm_db_index_operational_stats (details of I/O, latching, locking and access method activity for table partitions or database indexes)

Note: These DMVs are not supported in SQL 2000. If you are using SQL 2000, use the DBCC SQLPERF('WAITSTATS') statement.

The relative wait numbers and times for every latch class must be reviewed to understand SQL Server instance performance issues.

Lock timeouts/sec

The number of locks per second that timed out.

Equivalent PerfMon counterSQLServer:Locks - Lock Timeouts/sec
Explanation

Locks are held on SQL Server resources to stop them from being used simultaneously by another transaction. For example, if a transaction holds an exclusive lock on a row, no other transaction can make changes to that row until the lock is removed. Fewer locks allow more concurrent transactions to take place and this may improve performance.

The lock timeout limit in SQL Server depends on the value set using the SET LOCK_TIMEOUT command. If no value is set, SQL Server will wait indefinitely for the lock to clear.

Guideline values

Any value greater than 0 suggests that users may be experiencing problems with queries that are not completing.

Check also

Lock waits/sec. This can identify the number of locks that did not timeout but had to wait for resource.

Lock waits/sec

The number of locks per second that could not be satisfied immediately and had to wait for resources.

Equivalent PerfMon counterSQLServer:Locks - Lock Waits/sec
Explanation

Locks are held on SQL Server resources to stop them from being used simultaneously by another transaction. For example, if a transaction holds an exclusive lock on a row, no other transaction can make changes to that row until the lock is removed. Fewer locks allow more concurrent transactions to take place and this may improve performance.

Guideline values

Any value greater than 0 suggests that some level of blocking is occurring. It is best practice to compare the current value against a baseline to determine what is normal for your SQL Server. An increasing lock wait time indicates potential resource contention.

Check also

Buffer cache hit ratio. If this value is above 90%, look at the Avg. disk queue length. Excessive disk queueing may be caused by badly configured or inadequately sized disk subsystems. You may want to review indexing and investigate the numbers of concurrent connections. If the Buffer cache hit ratio is below 90%, additional RAM may be required.

Avg. lock wait time

The average wait time (in milliseconds) for each lock that could not be satisfied immediately and had to wait for resources.

Equivalent PerfMon counterSQLServer:Locks - Average Wait Time (ms)
Explanation

Locks are held on SQL Server resources to stop them from being used simultaneously by another transaction. For example, if a transaction holds an exclusive lock on a row, no other transaction can make changes to that row until the lock is removed. Fewer locks allow more concurrent transactions to take place and this may improve performance.

Guideline values

Average wait times of 500 ms or more may indicate excessive blocking and should be investigated. In general, Avg. lock wait time correlates with Lock waits/sec, and if you see both of them increasing, you need to investigate what is causing the blocking issues.

Check also

Lock waits/sec to identify peaks over time, and Lock timeouts/sec to see how many locks reached the threshold set using the SET LOCK_TIMEOUT command.

Database metrics

Transactions/sec

Number of transactions started for the database per second. Each data point shows the average number of transactions per second since the previous data point.

Equivalent PerfMon counterSQLServer:Databases - Transactions/sec
Explanation

A transaction is one or more database operations combined into a single operation; either user-defined transactions surrounded by BEGIN TRAN and COMMIT TRAN or individual data modification statements such as INSERT or UPDATE. Transaction rate is affected by general system performance and resources such as I/O, number of users, cache size, and complexity of requests.This counter records only explicit transactions or transactions that change data.

Note: Transactions per second only measures activity inside a transaction, not all activity, The Batch requests/sec metric is a more complete picture of all SQL Server activity.

Guideline values

A high number of transactions per second is not necessarily an indication of a problem, only that there is a lot of activity on your SQL Server. You should establish a baseline value and track when transactions are unusually high or low.

Active transactions

Number of active transactions for the database. Each data point shows the number of active transactions at the point the data was collected.

Equivalent PerfMon counterSQLServer:Databases - Active Transactions
Explanation

Active transactions are UPDATE transactions on the database that that have not yet been committed or rolled back.

Note: This metric measures activity inside a transaction, not all activity. The Batch requests/sec metric is a more complete picture of all SQL Server activity.

Guideline values

Generally, the number of active transactions should be considerably lower than the number of Transactions/sec(as active transactions show only those transactions that have not yet completed). If the number of transactions currently active is higher than the Transactions/sec metric, then this may indicate you have long-running transactions. Long running transactions can increase lock times and cause blocking.

Total size

The total size (in GB) of all data files (.mdf and .ndf) and log files (.ldf) for this database.

Equivalent PerfMon counterSQLServer:Databases - Data File(s) Size (KB) + SQLServer:Databases - Log File(s) Size (KB)
Explanation

This value is the sum of the total data files and total log files sizes. It is useful for determining the amount of space required by the database and identifying trends.

Guideline values

For best practice, MDF, NDF, and LDF files should be proactively managed in order to prevent autogrowth from kicking in and growing your database files.

Note: If the values for this metric drop to zero intermittently, the most likely cause is a problem affecting sys.master_files from which SQL Monitor collects the data.

Check alsoLog size for details of total .ldf files, and Data size for details of total .mdf and .ndf files.

Data size

Total size of all data files (.mdf and .ndf files) for the specified database.

Equivalent PerfMon counterSQLServer:Databases - Data File(s) Size (KB)
Explanation

This value is the total size of the database, excluding the transaction log. Information about individual primary data files (.mdf) and secondary data files (.ndf) such as their current size, location, and autogrowth settings, is shown under the Filessection of the Database overview page in SQL Server Management Studio.

Guideline values

Look out for unusual large increases in the size of your data files. Ideally your database should be sized in a way that minimizes autogrowth, as each size increase is expensive in terms of I/O and will also physically fragment your data and log files.

Growing a file on demand is an expensive process and will impact performance. Autogrowth should only be used as a safety valve to allow a database to grow should you accidently run out of space. It should not be used to manage your MDF file growth.

In general, data files should be pre-sized when they are first created to meet future expected growth, help avoid file fragmentation and ensure better database performance.

Note: If the values for this metric drop to zero intermittently, the most likely cause is a problem affecting sys.master_files from which SQL Monitor collects the data.

Check also
  • Full Recovery Mode is used for all production databases.
  • When using Full Recovery Mode, be sure you take regular full and transaction log backups.
  • The files are set to be the amount of space you need for normal activity. Monitor the data files and manually add space as data grows. You should have 6-12 months of space for data growth in your data files.
  • If you don't have space for the database to grow at its current rate, move the database to a larger disk drive or upgrade the disk itself.

Caution: Shrinking files to reduce space causes fragmentation and is not recommended.

Log size

Total size of all log files (.ldf) for this database.

Equivalent PerfMon counterSQLServer:Databases - Log File(s) Size (KB)
Explanation

Each transaction performed in the database is recorded in the transaction log. This allows recovery in the event of data loss since the last backup. Information about the log file (.ldf) such as its location, autogrowth setting and current size, is shown under the Files section of the Database overview page in SQL Server Management Studio.

Guideline values

Transaction logs are serial records of database modifications. They're used during recovery operations to commit completed transactions and to roll back uncompleted transactions. The size of the log file is determined by the Recovery Model set for the database. By default when creating a new database it will use the Full recovery mode (inherited from the Model database), where transactions in the log file are only ever removed when a transaction log backup is performed. When you back up a transaction log, the backup stores the changes that have occurred since the last transaction log backup and then truncates the log, which clears out transactions that have been committed or aborted. If the transaction log is not backed up regularly, it will grow indefinitely in size and could fill your entire disk.

Note: If the values for this metric drop to zero intermittently, the most likely cause is a problem affecting sys.master_files from which SQL Monitor collects the data.

Check also
  • The transaction log file should be pre-allocated to the size you think it needs to be when your system is in production. This size will depend on the number of transactions you expect, and how often you backup the transaction log.
  • Set correct autogrowth properties: 10% autogrowth for data and log files (the default setting) may be sufficient for low utilization databases, but 500 MB autogrowth rate may be more suitable for a busy database, allowing for growth over time without the heavy I/O impact caused by regular autogrowth operations. Even in Simple recovery model, if you are writing to the transaction log fast enough, the checkpoint process cannot keep up, and the log file will trigger autogrowth.
  • If you are performing a one-off bulk insert operation, you may want to switch the recovery model to bulk logged for the duration of the insert.
  • If you do not need point-in-time recovery for this database, then it may be appropriate to switch to Simple recovery model.
  • If a database is configured with the Full or Bulk Logged recovery model, you should back up the transaction log regularly so it can be truncated to free up inactive log space. Truncating a log file removes inactive virtual log files, but does not reduce the file size.

Log space used

Percentage of total transaction log file size currently containing log records.

Equivalent PerfMon counterSQLServer:Databases - Log File(s) Used Size (KB)
Explanation

The transaction log file stores the details of all the modifications that you perform on your SQL Server database. The amount of space it takes up depends on the recovery model used, database activity, how often transaction logs are backed up, among other factors.

Guideline values

It is normal to see the log space figure to vary considerably in size over time as transactions are recorded, and removed when log backups are performed.

Log bytes flushed/sec

Shown on the Availability group overview as Log growth.

Average size of the log flush per second. Each data point shows the average log bytes flushed per second for this database since the previous data point.

Equivalent PerfMon counterSQLServer:Databases - Log Bytes Flushed/sec
Explanation

This metric helps to assess utilization and identify trends of the transaction log. A log flush occurs when data is written from the log cache to the physical transaction log file on disk, every time a transaction is committed.

Within availability groups, both the primary and the secondary replicas have a log bytes flushed rate. On the primary replica, this shows how quickly log records are being added to the log send queue. On the secondary, it shows how quickly log bytes are being added to the redo queue.

The Availability group overview page only shows the rate of log bytes flushed on the primary replica.

Guideline values

The size of a log flush varies depending on the nature of the transaction. For example, a single INSERT into 1,000,000 rows will result in a single log flush, but a high number of bytes.

Within availability groups, if the rate of log bytes flushed on the primary replica is higher than the rate at which log bytes are sent, the size of the log send queue will increase. To estimate the potential data loss on failover (in asynchronous-commit mode), divide the log send queue by the log bytes flushed.

If the rate of log bytes flushed on the secondary replica is higher than the redo rate, the size of the redo queue will increase.

Check also
  • Log flushes/sec: The number of log pages flushed to the transaction log per second.
  • Log send queue: The amount of log records in the primary database log file waiting to be sent to the secondary replica.
  • Redo queue: The amount of log records that needs to be redone (written to the secondary database) for the synchronous-commit primary and secondary replicas to be synchronized.

Log flushes/sec

Number of log flushes per second for this database, averaged since last collection time.

Equivalent PerfMon counterSQLServer:Databases - Log Flushes/sec
Explanation

A log flush occurs when a transaction is committed and data is written from the log cache to the physical transaction log file. The log cache is a temporary location in memory where SQL Server stores data to be written to the log file, and is used to roll back a transaction before it is committed if required. Once a transaction is completed (and can no longer be rolled back), the log cache is immediately flushed to the physical log file on disk.

Guideline values

Log flushes per second should generally correlate with the number of transactions per second. If log flushes per second seems to be significantly higher than the expected number of transactions, check your use of explicit transactions in queries.

Explicitly defining the start and end of transactions (rather than implicit transactions, where SQL Server needs to flush the log for each data statement) should reduce the number of log flushes, and reduce impact on I/O.

Check also

Log bytes flushed/sec

Log flush waits/sec

Number of transaction commits per second waiting to be flushed from the log cache to the physical transaction log file on disk.

Equivalent PerfMon counterSQLServer:Databases - Log Flush Waits/secg Bytes Flushed/sec
Explanation

A high number of log flush waits indicates that it is taking longer than normal to flush the transaction log cache to the file on disk, which will slow SQL Server's performance.

Guideline values

Log flush waits should be as close to 0 as possible.

Check also

Disk avg. write time. If this is greater than 5 ms, then an I/O bottleneck is likely to be occurring.

Possible solutionsMoving the location of your transaction log (LDF file) to a separate physical drive from your data files (MDF files) will increase performance, as writing to the log and data files will not be competing for access to the same disk. Also consider using faster disk drives, using RAID 10 for transaction log file storage, adding additional spindles to your existing RAID array, short-stroking, or adding additional disk controllers.

Availability group metrics

Log bytes received/sec

Shown on the Availability group overview as Received rate.

The rate at which the the secondary replica receives log records from the primary replica.

Equivalent PerfMon counterSQLServer:Database Replica - Log Bytes Received/sec
ExplanationThis value measures how quickly log records are received by the secondary replica.
Guideline values

To estimate how long a secondary replica will take to catch up with the primary replica, divide the log send queue by the rate of log bytes received.

Check also
  • Log send queue: Amount of log records in the primary database log file waiting to be sent to the secondary replica.
  • Log bytes flushed/sec (on the secondary): Average size of the log flush per second.

Log send queue

Shown on the Availability group overview as Send queue.

Amount of log records in the primary database log file waiting to be sent to the secondary replica.

Equivalent PerfMon counterSQLServer:Database Replica - Log Send Queue
ExplanationThis value is the amount of log data that has been flushed to disk on the primary replica but hasn't yet been sent to the secondary replica.
Guideline values

The rate of log bytes flushed (on the primary replica) adds to the log send queue. If the rate of log bytes flushed on the primary is higher than the rate at which log records are sent, the log send queue will increase.

To estimate the potential data loss on failover (in asynchronous-commit mode), divide the log send queue by the log bytes flushed.

From the log send queue, the log records are sent to the secondary replica. To estimate how long it will take the secondary replica to catch up with the primary replica, divide the log send queue by the log bytes received/sec.

If you use the secondary replica for reporting and want to know how much data from the primary replica is not yet represented in the secondary replica, add together the log send queue and the redo queue.

Check also

Log bytes flushed/sec: Average size of the log flush per second.

Redo queue

The amount of log records that needs to be redone (written to the secondary database) for the synchronous-commit primary and secondary replicas to be synchronized.

Equivalent PerfMon counterSQLServer:Database Replica - Recovery Queue
ExplanationAfter the log data has been hardened into the secondary replica's log file, it joins the redo queue.
Guideline values

The rate of log bytes received adds to the size of the redo queue, while the redo rate takes away from it. If the rate of log bytes received is higher than the redo rate, the size of the redo queue will increase.

In event of a failover, the redo queue needs to be cleared before the secondary replica can transition to the role of primary replica. To estimate the time, in seconds, that it will take the secondary replica to redo the log (and therefore how long failover will take), divide the redo queue by the redo rate.

If you use the secondary replica for reporting and want to know how much data from the primary replica is not yet represented in the secondary replica, add together the log send queue and the redo queue.

Check also
  • Log send queue: Amount of log records in the primary database log file waiting to be sent to the secondary replica.
  • Log bytes received/sec: The rate at which the secondary replica receives log records from the primary replica.
  • Redo queue: The rate at which log records are redone in the secondary database to complete synchronization.

Redo rate

The rate at which log records are redone (written to the secondary database).

Equivalent PerfMon counterSQLServer:Database Replica - Redone Bytes/sec
ExplanationLog records from the redo queue are written to the secondary database, completing the synchronization process. If the redo rate is lower than the rate of log bytes flushed on the secondary replica, the size of the redo queue will increase.
Guideline valuesTo estimate the time, in seconds, that it will take the secondary replica to redo the log, divide the redo queue by the redo rate.
Check also
  • Log bytes flushed/sec (on secondary replica): Average size of the log flush per second.
  • Redo queue: The amount of log records that needs to be redone (written to the secondary database) for the synchronous-commit primary and secondary replicas to be synchronized.

Transaction delay ms/sec

This value measures the extent to which transactions are slowed down by synchronous replication.

Equivalent PerfMon counterSQLServer:Database Replica - Transaction Delay
ExplanationIn synchronous-commit mode, the primary replica doesn't commit any transactions until it receives acknowledgement that all synchronous secondary replicas have finished hardening the log, resulting in transaction delay. This is the total delay across all transactions caused by the primary replica waiting for commit acknowledgement from the secondary replicas. As such, it's useful for working out whether activity on the secondary replicas is slowing down the primary replica.
Guideline valuesThe flow control time contributes to the transaction delay on synchronous-commit replicas, because it affects the rate at which log records are sent. The rate of log bytes flushed on the secondary replica also affects the transaction delay.
Check also
  • Log bytes flushed/sec (on secondary replica): Average size of the log flush per second.
  • Flow control time/sec: The total time that log records wait per second due to flow control.

Flow control time ms/sec

The total time that log records waited per second due to flow control.

Equivalent PerfMon counterSQLServer:Availability Replica - Flow Control Time (ms/sec)
ExplanationFlow control throttles the flow of network traffic caused by replication, by limiting the speed at which the primary replica sends log records to the secondary replica. This helps to even out the usage of network and memory resources. This value shows the total time, in milliseconds per second, that the log records had to wait before being sent to the secondary replica, because of flow control.
Guideline values

Flow control can restrict the rate at which log records are sent from the primary replica. This will affect the log send queue: if the rate at which log records are sent is lower than the rate of log bytes flushed (on the primary replica), the log send queue will increase.

The flow control time also contributes to the transaction delay on synchronous replicas, because it affects the rate at which log data is transferred between the primary and the secondary replicas.

Check also
  • Log bytes flushed/sec (on secondary replica): Average size of the log flush per second.
  • Log send queue: Amount of log records in the primary database log file waiting to be sent to the secondary replica.
  • Transaction delay: The extent to which transactions are slowed down by synchronous replication.

Do you have any feedback on this documentation?

Let us know at sqlmonitorfeedback@red-gate.com


Didn't find what you were looking for?