SQL storage performance.

The single most important metric is latency. Latency is a measure of system health and the availability of a system resource. Latency is governed by queuing theory and an important contribution to queuing theory is Little’s Law. This law states that as a steady state system reaches capacity then response time reach infinity. So for our systems this means that as the storage capacity reaches its limit then response times get so big that the system is unusable and response times are the I/O times. This means we can test our storage by using I/O times.

Physical I/O response time is the amount of time (in seconds) that an average read or write request to a physical disk takes to complete. You can get this data easily by using Performance Monitor’s (perfmon) Logical or Physical Disk counters for Avg. Disk sec/Read and Avg. Disk sec/Write. These latency measures are the single biggest indicator of I/O health.

Most experts would agree that with today’s technology, the response times when using 15,000rpm disks should be in the range that Figure 1 shows.

SQL server best practise generally call for no more than 20 ms of delay but if you implement synchronous data replication with AlwaysOn Availability Groups then you cannot tolerate more than 10 ms. Some application may be even more sensitive.

Another approach is to keep an eye on the PAGEIOLATCH_XX and WRITELOG waits. These stats will also tell you a lot about how well you’re keeping up with the data and log I/O requests. High average waits on either of these stats should spark interest in terms of physical I/O response times.

You must remember to view these times as part of the whole and not in isolation. There are many kinds of PAGEIOLATCH waits (this is signified with the XX at the end) the most common type is PAGEIOLATCH_SH.

PAGEIOLATCH_SH – (SHare) waiting for a data file page to be brought from disk into the buffer pool so its contents can be read. When this wait type is the most common on a server, the usual reaction is that the I/O subsystem must have a problem and so that’s where investigations should be fixed.

In fact the first thing to do is to compare the PAGEIOLATCH_SH wait count and duration against your baseline. If the volume of waits is more or less the same, but the duration of each read wait has become much longer, then I’d be concerned about an I/O subsystem problem, such as:

  • A misconfiguration or malfunction at the I/O subsystem level
  • Network latency
  • Another I/O workload causing contention with our workload
  • Configuration of synchronous I/O-subsystem replication/mirroring

In my experience, the pattern is often that the number of PAGEIOLATCH_SH waits has increased substantially from the baseline amount and the wait duration has also increased (i.e. the time for a read I/O has increased), because the large number of reads overloads the I/O subsystem. This isn’t an I/O subsystem problem – this is SQL Server driving more I/Os than it should be. The focus now needs to switch to SQL Server to identify the cause of the extra I/Os.

If you find that you need more IOPS (Input Output operations Per Second) then a simple solution is to add more disks. If one disk performs 150 IOPS then two will perform 300 and so on. Remember that read and write IOPS are different. For a RAID 10 system that has 10000 IOPs for reads it will only have 5000 IOPS for writes.

Remember that sequential disk access is much more efficient than random disk access. Designing an application to generate sequential access can be the most effective action that can be taken.

Generating I/Os that are too big will stress a system and increase latency. SQL server backup will generate up to 1000KB I/O. Changing the backup to use 512 I/O will normally reduce latency and reduce backup times.

To change the backup to a maximum transfer size of 512KB:

BACKUP DATABASE [DBName] TO DISK = ‘E:\dump\BAckupFile.bak’

WITH MAXTRANSFERSIZE=524288, NAME = BackupName

GO

NTFS allocation unit size:

When you are formatting a partition you can choose the allocation unit size…the default is 4KB. SQL server files are usually large and Microsoft recommends you use a 64Kb allocation unit file size for data, log and tempdb. When using FILESTREAM or SSAS use the 4KB allocation unit size.

Testing storage performance.

SQLIOSim is designed to simulate SQL Server I/O. It will stress the SQL server, the windows Server and the storage

If you start running SQLIO, it uses a default test size of 100MB.  This size test will likely be smaller than your storage cache. To create a larger test file we need to edit the param.txt file that is located wherever you installed SQLIO. You need administrator rights to edit this file so launch notepad or notepad++ or whatever text editor you’re using as an administrator. The param.txt file will have two rows:

c:\testfile.dat 2 0x0 100

#d:\testfile.dat 2 0x0 100

The first line identifies c:\ as the target drive and testfile.dat as the test file to create on that drive. The 2 refers to the number of threads to use when testing the I/O patterns. The 0x0 value indicates that all CPUs should be used.

The # at the start of the second line means that it is commented out so we don’t need to worry about that line. We do however want to change the first line.

Change the drive letter to the drive that you wish to test. To increase the test file size to a 20GB test file, replace 100 with 20480.

Save the file and then open a new file and call it testFile.bat.

Type in the following lines of code:

sqlio -kR -frandom -t1 -o4 -b64 -s60 -BH -LS E:\testfile.dat

sqlio -kR -frandom  -t2 -o4 -b64 -s60 -BH -LS E:\testfile.dat

sqlio -kW -frandom -t1 -o4 -b64 -s60 -BH -LS E:\testfile.dat

sqlio -kW -frandom -t2 -o4 -b64 -s60 -BH -LS E:\testfile.dat

sqlio -kR -fsequential -t1 -o4 -b64 -s60 -BH -LS E:\testfile.dat

sqlio -kR -fsequential -t2 -o4 -b64 -s60 -BH -LS E:\testfile.dat

sqlio -kW -fsequential -t1 -o4 -b64 -s60 -BH -LS E:\testfile.dat

sqlio -kW -fsequential -t2 -o4 -b64 -s60-BH -LS E:\testfile.dat

To explain what this code does I will explain each part of the first line of code.

The command starts by calling the sqlio utility.

k            this option which specifies the I/O type (R for read operations and W for write operations).

s             this option specifies the test duration in seconds, in this case, 300 seconds or 5 mins.

-f            this option determines how data blocks are handled, random or sequential. A random value indicates that blocks will be chosen randomly, as is often the case with T-SQL queries. The sequential value specifies that blocks should be accessed sequentially, which is typical for operations such as backups and bulk loads.

-o,          This option indicates the number of I/Os that can be outstanding in a single thread.

-b           This option. This is the I/O block size in kilobytes (KB). In the example, we’ve specified both 8 and 64KB.   The reason for this is that SQL Server does a lot of random stuff in 8KB chunks, and we’re also testing sequential stuff in 64KB chunks.

-BH        This option selects hardware buffering.

LS         This option is to collect system latency information.

Now go to the command prompt and change directories to the directory where you installed SQLIO. Now type in:

sqlio -kW -s10 -fsequential -o8 -b64 -Fparam.txt

This will do a 10 second test and will also create the test file using the file location and size parameters you specified in param.txt.

Now type in :

testFile.bat

This runs the testFile that we created earlier.

In the SQLIO results the IO’s / sec and the MB’s / sec are usually the main numbers to look for.