Category Archives: Hardware

Load Testing Your Storage Subsystem with Diskspd – Part III

In our final post in our “Load Testing Your Storage Subsystem with Diskspd” series, we’re going to look at output from Diskspd and run some tests and interpret results. In our first post we showed how performance can vary based on access pattern and IO size. In our second post we showed how to design a test to highlight those performance characteristics and in this post we’ll execute those tests and review the results. 

First let’s walk through the output from Diskspd, for now don’t focus on the actual results. There are four major sections:

  • Test Parameters – here is the test’s parameters. Including the exact command line parameters executed. This is great for reproducing tests.
  • CPU Usage – CPU usage for the test, recall if you are not using all your bandwidth, you may want to add threads. If your CPU burn is high, you may want to back off on the number of threads.
  • Performance – this is the meat of the test. Here we see bandwidth measured in MB/sec and latency measured in microseconds. With SSDs and today’s super fast storage I/O subsystems, you’ll likely need this level of accuracy. This is alone beats SQLIO in my opinion. I’m not much a fan of IOPs since those numbers require that you know the size of the IO for it to have any meaning. Check out Jeremiah Peschka’s article on this here. Remember, focus on minimizing latency and maximizing I/O please refer back Part I and Part II posts in this series for details.
  • Histogram – this gives a great representation of how your test did over the whole run. In this example, 99% of the time our latency was less than 0.654ms…that’s pretty super.

Impact of I/O Access Patterns

  • Random

    diskspd.exe -d15 -o32 -t2 -b64K -h -r -L -w0 C:\TEST\iotest.dat

    In this test you can see the that there is high throughput and very low latency. This disk is a PCIe attached SSD, so it performs well with a random IO access pattern.

  • Sequential

    diskspd.exe -d15 -o32 -t2 -b64K -h -s -L -w0 C:\TEST\iotest.dat

    In this test you can see that the sequential I/O pattern yields a similar performance profile to the random IO test on the SSD. Recall that an SSD does not have to move a disk head or rotate a platter. The access latency to any location on the drive has the same latency cost.  

Impact of I/O sizes

  • Tranaction log simulation  

    diskspd.exe -d15 -o1 -t1 -b60K -h -s -L -w100 C:\TEST\iotest.dat

    This test measures access latency of single thread with a very small data transfer, as you can see latency is very low at 0.289. This is expected on a low latency device such as a local attached SSD.
     

  • Backup operation simulation

    diskspd.exe -d15 -o32 -t4 -b512K -h -s -L -w0 C:\TEST\iotest.dat

    And finally, our test simulating reading data for a backup. The larger I/Os have a higher latency but also yield a higher transfer rate at 2,225MB/sec.

In this series of post we introduced you into some theory on how drives access data, we presented tests on how to explore the performance profile of your disk subsystem and reviewed Diskspd output for those tests. This should give you the tools and ideas you need to load test your disk subsystem and ensure your SQL Servers will perform well when you put them into production!

Load Testing Your Storage Subsystem with Diskspd – Part II

In this post we’re going discuss how to implement load testing of your storage subsystem with DiskSpd. We’re going to craft tests to measure bandwidth and latency for specific access patterns and IO sizes. In the last post “Load Testing Your Storage Subsystem with Diskspd”  we looked closely at access patterns and I/O size and discussed the impact each has on key performance attributes. 

Diskspd command options

Let’s start with some common command options, don’t get caught up on the syntax. Diskspd’s documentation is fantastic. It’s included with the program download here. Here I’m going to tell you why I set these settings this way, so you can adjust them as needed for your environments.

  • Duration ( -d<seconds> ) – this is the runtime of the test, the longer the better. The longer your test the more likely you’ll smooth out any performance anomalies, such as competing for shared resources. A longer test will likely invalidate any caches along the I/O path that may cache data and skew your results. We’re trying to measure the I/O capacity of the whole pipeline…not any caches.
     
  • Threads ( -T<count> or -f<count> ) – if -T is used this is the number of threads per file. If -f is used this is the number of threads for the whole test. For smaller systems I usually set number of threads to the number of cores. On larger systems I start at 16 and change it up or down based on results. For systems with very fast I/O paths you may need to add additional I/O threads to max out the throughput.
     
  • Outstanding I/Os ( -o<count>) – the number of I/Os ready to be dispatched per thread. Your storage subsystem may be fantastic, so when testing for throughput you may need to stack on more I/Os to increase the pressure. I usually start with this equal to the number of spindles in my LUN. Then I increase outstanding I/Os until I start to see latency increase. Once you see that, congrats you just saturated your I/O subsystem! Use this in conjunction with threads when trying to saturate an I/O path. If latency is already at unacceptable levels, reduce outstanding I/Os…but you’ll likely start to see a reduction in throughput. Try to find the sweet spot between minimum latency and maximum bandwidth. If each match the physical attributes of your disk subsystem you’re heading in the right direction. If outstanding I/Os is set to 1, the I/O is synchronous, all other values are asynchronous.…more on this later. 
     
  • File Size ( -c<size> ) – the file size, I like to have this be larger than the largest cache in the I/O pipeline. This includes your HBA, SAN controller…anything along the way between the running process and the disk. 
     
  • Block Size ( -b<size>[K|M|G|b] ) – the size of the IO, this is what we’ll change this to match varying I/O patterns in SQL Server.
     
  • Disable hardware write and software caching ( -h ) – we want to disable software (file system) caching and request disabling hardware caching. Disabling hardware caching is only a request of the storage hardware and that’s one of the reasons why want to ensure the file size we use is larger than the largest cache in our I/O path. This is enabled on all of the tests in this post. Further, for durability reasons most major relational database systems, SQL Server included, do not use the file system cache. They rely on their own caching mechanisms. 
     
  • IO Pattern – discussed in detail in our previous post here
    • Random I/O ( -r 
    • Sequential I/O ( -s ) – if using multiple threads, use -si this will coordinate the threads’ access into the file ensuring a sequential access pattern.
       
  • Write Percentage-w<percentage> ) – 0 is all reads, 100 is all writes.You can choose any value between, but I like to isolate read and write tests for analysis. 
     
  • Measure latency statistics ( -L ) – the whole reason we’re doing this is to understand our performance, go ahead and turn this on.

Impact of I/O Access Patterns

Here are some example Diskspd tests that implement sequential and random access patterns. These tests simulate index seeks/point queries and index scans/range queries.

  • Random

    diskspd.exe -d15 -o32 -t4 -b64K -h -r -L -w0 D:\TEST\iotest.dat

    This test will run for 15 seconds, with 32 outstanding IOs, using 4 threads, with 64k IOs. The hardware and software caches are disabled, access pattern is random and is read only. In our previous post we defined the characteristics of this access pattern, we should expect lower bandwidths and higher latencies in this test. This is due to the drives having to physically move to service the random I/O requests. This test is similar to an index seek/point query in SQL Server  (SSDs will still exhibit slightly higher latencies on random access as discussed in the last post here).

  • Sequential

    diskspd.exe -d15 -o32 -t4 -b64K -h -si -L -w0 D:\TEST\iotest.dat

    This test is the same as above, but uses a sequential access pattern. With sequential I/O we should see higher bandwidths with lower latencies. This is due to the data being physically contiguous on the drive. This test is similar to an index scan/range query in SQL Server.

Impact of I/O sizes

For these tests we’ll explore two I/O sizes. We’ll simulate a log buffer flush using a small 60KB synchronous, a small, single threaded, sequential write. Then we will simulate a backup operation with a much larger, multithreaded, sequential write.

  • Tranaction log simulation  

    diskspd.exe -d15 -o1 -t1 -b60K -h -s -L -w100 D:\TEST\iotest.dat

    In this test we simulate the writing of full transaction log records. The test is configured for synchronous I/Os by setting the outstanding I/O and threads to 1. Each I/O is 60KB and writes sequentially to the data file. We’re really trying to measure latency in the I/O subsystem and determine if there are any potential bottlenecks. 

  • Backup operation simulation

    diskspd.exe -d15 -o32 -t4 -b512K -h -si -L -w0 D:\TEST\iotest.dat 

    In this test we simulate the writing of a backup file. The test is configured for asynchronous, parallel I/Os by setting the outstanding I/O parameter to 32 and threads to 4. Each I/O is 512KB and reads sequentially from the file. We’re really trying to tax the I/O subsystem and reach a saturation point so we can really determine how much data our disk subsystem can move for reads. 

In this post we showed you how use Diskspd to craft tests to measure bandwidth and latency, two key attributes of your disk subsystem. In our next post in this series run some tests that simulate SQL Server I/O access patterns and review output. 

Load Testing Your Storage Subsystem with Diskspd

One of the primary activities I do before bringing SQL Server into production is load testing the storage subsystem. On a new system this is critical because I want to ensure that we’re “getting what we’ve paid for” when it comes to the disk subsystem. All too often there’s a configuration issue, component mismatch, a fundamental misunderstanding of the technology or worse an insufficient disk subsystem…these all can lead to poor disk performance. Even if it’s the simplest test, its imperative to measure performance as it’s significantly harder to make changes to a SQL Server once a database is in production. So do your testing. This is especially an important topic if your disks are not direct attached or in a shared storage environment such as a SAN or VMware data store. Storage networks, controllers, shelves…it gets complicated fast!

In this article we’re going to discuss what we’re looking for when load testing your storage subsystem and introduce DiskSpd a tool for performance testing disks using varying IO patterns and sizes.

Key measurements

The two metrics of key interest to us in a disk subsystem are bandwidth and access latency. Bandwidth, often referred to as data transfer rate, is how much data can be moved in a time interval, think gigabytes per second. Access latency or access time is how long a disk transaction takes from request to the delivery of the requested data. Latency is measured in milliseconds for HDD and microseconds for solid state drives. I don’t get hung up on IOPs as much, as they’re really a function of latency. Keep your latency low and your IOPs will likely be high.

Access Patterns

The access pattern in which data is read from a drive can have significant implications on the bandwidth and latency of the request. There are two access patterns that we’re concerned with when load testing a system, sequential and random. Each with its own performance characteristics.

Sequential access is when an application requests a block of data and the next block requested is physically adjacent on the disk. On a hard disk drive (HDD) the drive’s head does not have to move and the disk platter simply rotates under the head to read the next block off the disk. On a solid state disk (SSD) this becomes less of a concern as SSDs access latency is constant across the drive. In SQL Server, sequential IO is analogous to table/index scans, translation log writes and database backups. When load testing a system we use sequential IO patterns to saturate the disk subsystems IO path and determine if there any physical bottlenecks limiting bandwidth between the application driving the IO and the device serving up the data…the disk drive, the storage network (interconnect) or the SAN.  

Hard Disk Drive

Figure 1: Hard Disk Drive – Image from Wikipedia

Random access is when an application requests a block of data and the next block requested may not physically adjacent on the disk. On an HDD the drive’s head may have to move and the disk’s platter rotate to read the next block off the disk. This all contributes to our access latency. Drive vendors refer to this as seek time in drive specifications. A solid state disk is composed of an array of flash memory chips, each of which have fixed time access latency. So on an SSD if an application requests a block, the request for the next block can be serviced in a fixed amount of time, regardless of its location on the disk (there is a small amount of latency when switching between flash packages). This means random IO patterns can be serviced more efficiently as we do not have to wait for a physically moving component to access the next block…and that’s the game changer for SSDs when compared with HDDs.

Further the access latency of an IO on an SSD is measured in microseconds, this is an order of magnitude faster than an HDD which is in milliseconds. In SQL Server, random IO patterns can occur on index seeks, data file writes and operations that read from the transaction log. When load testing a system we use random IO patterns to find the overall access latency to the disk subsystem and determine if there are any components in the system that are not servicing the requests “fast enough” or in other words contributing to access latency.

Variable IO Sizes

An application can request data in variable IO sizes. For example, SQL Server can perform IOs in 8KB, 64KB, 128KB, 256KB and more. The size of the IO impacts both latency and bandwidth. A small IO can have a lower access latency as it is measured from the request of the IO until the delivery of all of the data requested. A small IO potentially can consume less bandwidth as well, as each IO translates into physical disk accesses, each of which will have some access latency. So of key importance when measuring smaller IO sizes is a low access latency, if the disk subsystem can service the IO quickly enough then we can see higher bandwidths on smaller operations. But we’re really shooting for getting the IO completed as quickly as possible.

A larger IO request can have a higher access latency since the measurement is from the beginning of the request until it is finished. Simply put a larger IO will take longer to transfer because the IO is moving more data in one operation. The IO is still has to pay the cost of the initial access latency, but usually the dominant factor in the duration of the IO is the transfer. So of key importance when measuring larger IO sizes is higher bandwidth, but still keep an eye on latency.

Diskspd

Enter Diskspd, a tool from Microsoft which allows us to performance test a disk subsystem. Diskspd allows us to define specific IO patterns, IO sizes, file sizes, read or write access, number of threads, access stride, and many more options. One key feature of Diskspd is it reports access latency time in microseconds, which is becoming more important as SSDs are becoming more common in the enterprise. SQLIO has served us well for years, but Diskspd is really the next generation of testing.

In our next post we’ll discuss how to performance test a disk subsystem using Diskspd and look at some of the key values of bandwidth, latency, and likely IOPs too. 

Standard SQL Server Build

Often I’m asked what is the best practice for a single SQL Server installation. Well, that is a tricky questions and the answer is it always depends. Let’s discuss what a “standard” SQL Server build looks like if you had to start somewhere. Here let’s focus on Standard edition on a physical server. Enterprise edition and virtualization are topics that can stand on their own.
 
Processors – The higher the clock frequency the better
 
SQL Server is licensed by the core. Generally speaking, higher clock rates will yield higher data throughput. Meaning your query will get done faster. It’s a balance between clock speed cost and number of cores (SQL license costs). To see the impact of clock speed on your performance try adjusting your Windows power plan and time a workload…you will see the impact. Check out Glenn Berry’s post on this here and most recently here. Again since we’re likely licensing SQL by the core, we really want to feed the processor as quickly as possible, so next we discuss the need for a lot of memory and fast disks to do so.
 
Memory – At the moment memory is cheap, buy as much as you can afford. But not too much if you’re on Standard edition.

The maximum memory allowed on SQL Server Standard 2014 is 128GB per instance, 64GB if you’re on SQL Server Standard 2012. The more memory your system has the larger your memory allocation can be to things like the buffer pool. With a larger buffer pool, it’s more likely your query will be cached in memory and not have to retrieve your data from disk. Memory operates in nanoseconds, SSDs are microseconds and spinning disks are milliseconds. I know where I would want my data waiting for me. 
 
Storage – Isolate your major functions
 
This is where things depend highly on workload, specifically capacity and performance. This article is about the general case, so here’s we’re I’d start. Also, these basics apply to direct attached storage rather than SAN based storage. 
 
The disk controller should have a battery backed write cache. With a battery backed write cache your controller can tell the OS that it’s IO completed while caching the IO in its local memory. If you’re into it, add a hot spare. If you do, try to make all your drives the same size/type and it can be a global hot spare for any disk in the system this is simply a safety blanket in the event of a disk failure.
 
If you isolate your major functions as described below you’re ahead of the crowd already, the main reasons for this are differing IO patterns and the potential for recovery in the event of a disk failure:
Operating System – 2 disks RAID 1 – 100GB – you should just have your OS and SQL installation here. SSD if you can afford it, otherwise 15K RPM.

Databases – 4+ disks RAID 5 is OK. RAID10 is better – the number and capacity of these disks depend on your data size requirements, if you have the budget for SSD go for it, get MLC.

Transaction Logs – 2+ disks RAID 1, if you need more capacity 4+ disks RAID10 – if you have the budget for SSD go for it and get SLC

TempDB – 2+ disks RAID 1, if you need more capacity 4+ disks RAID10 – if you have the budget for SSD go for it and get SLC 
Backups
This 100% needs to be addressed in the first conversation about building a SQL Server. Where are you going to put them? How long are you going to keep them? How large are they going to be? Replication offsite? We could (and should) spend a whole day discussing backups. 
 
Here we describe a very generalized SQL Server hardware configuration, if you start here you’ll be in a good place. There are a ton of decision points that can drive a SQL Server hardware configuration in any direction. In future articles we’ll discuss how to configure Windows, Install SQL Server and instance level configuration basics for SQL Server.
 
If you need some assistance with your SQL Server installation, design and backups planning please feel free to contact me.
 
Follow me on Twitter: @nocentino