Tag Archives: Hardware

Load Testing Your Storage Subsystem with Diskspd – Part III

In our final post in our “Load Testing Your Storage Subsystem with Diskspd” series, we’re going to look at output from Diskspd and run some tests and interpret results. In our first post we showed how performance can vary based on access pattern and IO size. In our second post we showed how to design a test to highlight those performance characteristics and in this post we’ll execute those tests and review the results. 

First let’s walk through the output from Diskspd, for now don’t focus on the actual results. There are four major sections:

  • Test Parameters – here is the test’s parameters. Including the exact command line parameters executed. This is great for reproducing tests.
    Command Line: diskspd.exe -d15 -o1 -F1 -b60K -h -s -L -w100 C:\TEST\iotest.dat
    Input parameters:
    	timespan:   1
    	duration: 15s
    	warm up time: 5s
    	cool down time: 0s
    	measuring latency
    	random seed: 0
    	path: 'C:\TEST\iotest.dat'
    		think time: 0ms
    		burst size: 0
    		software and hardware write cache disabled
    		performing write test
    		block size: 61440
    		number of outstanding I/O operations: 1
    		thread stride size: 0
    		IO priority: normal
  • CPU Usage – CPU usage for the test, recall if you are not using all your bandwidth, you may want to add threads. If your CPU burn is high, you may want to back off on the number of threads.
    Results for timespan 1:
    actual test time:	15.00s
    thread count:		1
    proc count:		2
    CPU |  Usage |  User  |  Kernel |  Idle
       0|  30.10%|   1.04%|   29.06%|  69.89%
       1|   0.10%|   0.10%|    0.00%|  99.78%
    avg.|  15.10%|   0.57%|   14.53%|  84.84%
  • Performance – this is the meat of the test. Here we see bandwidth measured in MB/sec and latency measured in microseconds. With SSDs and today’s super fast storage I/O subsystems, you’ll likely need this level of accuracy. This is alone beats SQLIO in my opinion. I’m not much a fan of IOPs since those numbers require that you know the size of the IO for it to have any meaning. Check out Jeremiah Peschka’s article on this here. Remember, focus on minimizing latency and maximizing I/O please refer back Part I and Part II posts in this series for details.
    Total IO
    thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
         0 |      3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816 | C:\TEST\iotest.dat (20GB)
    total:        3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816
    Read IO
    thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
         0 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | C:\TEST\iotest.dat (20GB)
    total:                 0 |            0 |       0.00 |       0.00 |    0.000 |       N/A
    Write IO
    thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
         0 |      3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816 | C:\TEST\iotest.dat (20GB)
    total:        3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816
  • Histogram – this gives a great representation of how your test did over the whole run. In this example, 99% of the time our latency was less than 0.654ms…that’s pretty super.
      %-ile |  Read (ms) | Write (ms) | Total (ms)
        min |        N/A |      0.059 |      0.059
       25th |        N/A |      0.163 |      0.163
       50th |        N/A |      0.193 |      0.193
       75th |        N/A |      0.218 |      0.218
       90th |        N/A |      0.258 |      0.258
       95th |        N/A |      0.312 |      0.312
       99th |        N/A |      0.654 |      0.654
    3-nines |        N/A |     17.926 |     17.926
    4-nines |        N/A |     18.906 |     18.906
    5-nines |        N/A |    583.568 |    583.568
    6-nines |        N/A |    583.568 |    583.568
    7-nines |        N/A |    583.568 |    583.568
    8-nines |        N/A |    583.568 |    583.568 
        max |        N/A |    583.568 |    583.568

Impact of I/O Access Patterns

  • Random

    diskspd.exe -d15 -o32 -t2 -b64K -h -r -L -w0 C:\TEST\iotest.dat

    Read IO
    thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
         0 |     16066543616 |       245156 |    1021.49 |   16343.84 |    1.896 |     0.286 | C:\TEST\iotest.dat (20GB)
         1 |     16231759872 |       247677 |    1031.99 |   16511.91 |    1.877 |     0.207 | C:\TEST\iotest.dat (20GB)
    total:       32298303488 |       492833 |    2053.48 |   32855.75 |    1.886 |     0.250

    In this test you can see the that there is high throughput and very low latency. This disk is a PCIe attached SSD, so it performs well with a random IO access pattern.

  • Sequential

    diskspd.exe -d15 -o32 -t2 -b64K -h -s -L -w0 C:\TEST\iotest.dat

    Read IO
    thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
         0 |     16094724096 |       245586 |    1022.21 |   16355.35 |    1.895 |     0.260 | C:\TEST\iotest.dat (20GB)
         1 |     16263544832 |       248162 |    1032.93 |   16526.91 |    1.875 |     0.185 | C:\TEST\iotest.dat (20GB)
    total:       32358268928 |       493748 |    2055.14 |   32882.26 |    1.885 |     0.225

    In this test you can see that the sequential I/O pattern yields a similar performance profile to the random IO test on the SSD. Recall that an SSD does not have to move a disk head or rotate a platter. The access latency to any location on the drive has the same latency cost.  

Impact of I/O sizes

  • Tranaction log simulation  

    diskspd.exe -d15 -o1 -t1 -b60K -h -s -L -w100 C:\TEST\iotest.dat

    Write IO
    thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
         0 |      3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816 | C:\TEST\iotest.dat (20GB)
    total:        3162378240 |        51471 |     201.04 |    3431.10 |    0.289 |     2.816

    This test measures access latency of single thread with a very small data transfer, as you can see latency is very low at 0.289. This is expected on a low latency device such as a local attached SSD.

  • Backup operation simulation

    diskspd.exe -d15 -o32 -t4 -b512K -h -s -L -w0 C:\TEST\iotest.dat

    Read IO
    thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
         0 |      8552185856 |        16312 |     543.17 |    1086.33 |   29.434 |    26.063 | C:\TEST\iotest.dat (20GB)
         1 |      8846311424 |        16873 |     561.85 |    1123.69 |   28.501 |    25.373 | C:\TEST\iotest.dat (20GB)
         2 |      8771338240 |        16730 |     557.09 |    1114.17 |   28.777 |    25.582 | C:\TEST\iotest.dat (20GB)
         3 |      8876720128 |        16931 |     563.78 |    1127.56 |   28.440 |    25.353 | C:\TEST\iotest.dat (20GB)
    total:       35046555648 |        66846 |    2225.88 |    4451.76 |   28.783 |    25.593

    And finally, our test simulating reading data for a backup. The larger I/Os have a higher latency but also yield a higher transfer rate at 2,225MB/sec.

In this series of post we introduced you into some theory on how drives access data, we presented tests on how to explore the performance profile of your disk subsystem and reviewed Diskspd output for those tests. This should give you the tools and ideas you need to load test your disk subsystem and ensure your SQL Servers will perform well when you put them into production!

Load Testing Your Storage Subsystem with Diskspd

One of the primary activities I do before bringing SQL Server into production is load testing the storage subsystem. On a new system this is critical because I want to ensure that we’re “getting what we’ve paid for” when it comes to the disk subsystem. All too often there’s a configuration issue, component mismatch, a fundamental misunderstanding of the technology or worse an insufficient disk subsystem…these all can lead to poor disk performance. Even if it’s the simplest test, its imperative to measure performance as it’s significantly harder to make changes to a SQL Server once a database is in production. So do your testing. This is especially an important topic if your disks are not direct attached or in a shared storage environment such as a SAN or VMware data store. Storage networks, controllers, shelves…it gets complicated fast!

In this article we’re going to discuss what we’re looking for when load testing your storage subsystem and introduce DiskSpd a tool for performance testing disks using varying IO patterns and sizes.

Key measurements

The two metrics of key interest to us in a disk subsystem are bandwidth and access latency. Bandwidth, often referred to as data transfer rate, is how much data can be moved in a time interval, think gigabytes per second. Access latency or access time is how long a disk transaction takes from request to the delivery of the requested data. Latency is measured in milliseconds for HDD and microseconds for solid state drives. I don’t get hung up on IOPs as much, as they’re really a function of latency. Keep your latency low and your IOPs will likely be high.

Access Patterns

The access pattern in which data is read from a drive can have significant implications on the bandwidth and latency of the request. There are two access patterns that we’re concerned with when load testing a system, sequential and random. Each with its own performance characteristics.

Sequential access is when an application requests a block of data and the next block requested is physically adjacent on the disk. On a hard disk drive (HDD) the drive’s head does not have to move and the disk platter simply rotates under the head to read the next block off the disk. On a solid state disk (SSD) this becomes less of a concern as SSDs access latency is constant across the drive. In SQL Server, sequential IO is analogous to table/index scans, translation log writes and database backups. When load testing a system we use sequential IO patterns to saturate the disk subsystems IO path and determine if there any physical bottlenecks limiting bandwidth between the application driving the IO and the device serving up the data…the disk drive, the storage network (interconnect) or the SAN.  

Hard Disk Drive

Figure 1: Hard Disk Drive – Image from Wikipedia

Random access is when an application requests a block of data and the next block requested may not physically adjacent on the disk. On an HDD the drive’s head may have to move and the disk’s platter rotate to read the next block off the disk. This all contributes to our access latency. Drive vendors refer to this as seek time in drive specifications. A solid state disk is composed of an array of flash memory chips, each of which have fixed time access latency. So on an SSD if an application requests a block, the request for the next block can be serviced in a fixed amount of time, regardless of its location on the disk (there is a small amount of latency when switching between flash packages). This means random IO patterns can be serviced more efficiently as we do not have to wait for a physically moving component to access the next block…and that’s the game changer for SSDs when compared with HDDs.

Further the access latency of an IO on an SSD is measured in microseconds, this is an order of magnitude faster than an HDD which is in milliseconds. In SQL Server, random IO patterns can occur on index seeks, data file writes and operations that read from the transaction log. When load testing a system we use random IO patterns to find the overall access latency to the disk subsystem and determine if there are any components in the system that are not servicing the requests “fast enough” or in other words contributing to access latency.

Variable IO Sizes

An application can request data in variable IO sizes. For example, SQL Server can perform IOs in 8KB, 64KB, 128KB, 256KB and more. The size of the IO impacts both latency and bandwidth. A small IO can have a lower access latency as it is measured from the request of the IO until the delivery of all of the data requested. A small IO potentially can consume less bandwidth as well, as each IO translates into physical disk accesses, each of which will have some access latency. So of key importance when measuring smaller IO sizes is a low access latency, if the disk subsystem can service the IO quickly enough then we can see higher bandwidths on smaller operations. But we’re really shooting for getting the IO completed as quickly as possible.

A larger IO request can have a higher access latency since the measurement is from the beginning of the request until it is finished. Simply put a larger IO will take longer to transfer because the IO is moving more data in one operation. The IO is still has to pay the cost of the initial access latency, but usually the dominant factor in the duration of the IO is the transfer. So of key importance when measuring larger IO sizes is higher bandwidth, but still keep an eye on latency.


Enter Diskspd, a tool from Microsoft which allows us to performance test a disk subsystem. Diskspd allows us to define specific IO patterns, IO sizes, file sizes, read or write access, number of threads, access stride, and many more options. One key feature of Diskspd is it reports access latency time in microseconds, which is becoming more important as SSDs are becoming more common in the enterprise. SQLIO has served us well for years, but Diskspd is really the next generation of testing.

In our next post we’ll discuss how to performance test a disk subsystem using Diskspd and look at some of the key values of bandwidth, latency, and likely IOPs too. 

Standard SQL Server Build

Often I’m asked what is the best practice for a single SQL Server installation. Well, that is a tricky questions and the answer is it always depends. Let’s discuss what a “standard” SQL Server build looks like if you had to start somewhere. Here let’s focus on Standard edition on a physical server. Enterprise edition and virtualization are topics that can stand on their own.
Processors – The higher the clock frequency the better
SQL Server is licensed by the core. Generally speaking, higher clock rates will yield higher data throughput. Meaning your query will get done faster. It’s a balance between clock speed cost and number of cores (SQL license costs). To see the impact of clock speed on your performance try adjusting your Windows power plan and time a workload…you will see the impact. Check out Glenn Berry’s post on this here and most recently here. Again since we’re likely licensing SQL by the core, we really want to feed the processor as quickly as possible, so next we discuss the need for a lot of memory and fast disks to do so.
Memory – At the moment memory is cheap, buy as much as you can afford. But not too much if you’re on Standard edition.

The maximum memory allowed on SQL Server Standard 2014 is 128GB per instance, 64GB if you’re on SQL Server Standard 2012. The more memory your system has the larger your memory allocation can be to things like the buffer pool. With a larger buffer pool, it’s more likely your query will be cached in memory and not have to retrieve your data from disk. Memory operates in nanoseconds, SSDs are microseconds and spinning disks are milliseconds. I know where I would want my data waiting for me. 
Storage – Isolate your major functions
This is where things depend highly on workload, specifically capacity and performance. This article is about the general case, so here’s we’re I’d start. Also, these basics apply to direct attached storage rather than SAN based storage. 
The disk controller should have a battery backed write cache. With a battery backed write cache your controller can tell the OS that it’s IO completed while caching the IO in its local memory. If you’re into it, add a hot spare. If you do, try to make all your drives the same size/type and it can be a global hot spare for any disk in the system this is simply a safety blanket in the event of a disk failure.
If you isolate your major functions as described below you’re ahead of the crowd already, the main reasons for this are differing IO patterns and the potential for recovery in the event of a disk failure:
Operating System – 2 disks RAID 1 – 100GB – you should just have your OS and SQL installation here. SSD if you can afford it, otherwise 15K RPM.

Databases – 4+ disks RAID 5 is OK. RAID10 is better – the number and capacity of these disks depend on your data size requirements, if you have the budget for SSD go for it, get MLC.

Transaction Logs – 2+ disks RAID 1, if you need more capacity 4+ disks RAID10 – if you have the budget for SSD go for it and get SLC

TempDB – 2+ disks RAID 1, if you need more capacity 4+ disks RAID10 – if you have the budget for SSD go for it and get SLC 
This 100% needs to be addressed in the first conversation about building a SQL Server. Where are you going to put them? How long are you going to keep them? How large are they going to be? Replication offsite? We could (and should) spend a whole day discussing backups. 
Here we describe a very generalized SQL Server hardware configuration, if you start here you’ll be in a good place. There are a ton of decision points that can drive a SQL Server hardware configuration in any direction. In future articles we’ll discuss how to configure Windows, Install SQL Server and instance level configuration basics for SQL Server.
If you need some assistance with your SQL Server installation, design and backups planning please feel free to contact me.
Follow me on Twitter: @nocentino