Monthly Archives: September 2015

Load Testing Your Storage Subsystem with Diskspd – Part II

In this post we’re going discuss how to implement load testing of your storage subsystem with DiskSpd. We’re going to craft tests to measure bandwidth and latency for specific access patterns and IO sizes. In the last post “Load Testing Your Storage Subsystem with Diskspd”  we looked closely at access patterns and I/O size and discussed the impact each has on key performance attributes. 

Diskspd command options

Let’s start with some common command options, don’t get caught up on the syntax. Diskspd’s documentation is fantastic. It’s included with the program download here. Here I’m going to tell you why I set these settings this way, so you can adjust them as needed for your environments.

  • Duration ( -d<seconds> ) – this is the runtime of the test, the longer the better. The longer your test the more likely you’ll smooth out any performance anomalies, such as competing for shared resources. A longer test will likely invalidate any caches along the I/O path that may cache data and skew your results. We’re trying to measure the I/O capacity of the whole pipeline…not any caches.
     
  • Threads ( -T<count> or -f<count> ) – if -T is used this is the number of threads per file. If -f is used this is the number of threads for the whole test. For smaller systems I usually set number of threads to the number of cores. On larger systems I start at 16 and change it up or down based on results. For systems with very fast I/O paths you may need to add additional I/O threads to max out the throughput.
     
  • Outstanding I/Os ( -o<count>) – the number of I/Os ready to be dispatched per thread. Your storage subsystem may be fantastic, so when testing for throughput you may need to stack on more I/Os to increase the pressure. I usually start with this equal to the number of spindles in my LUN. Then I increase outstanding I/Os until I start to see latency increase. Once you see that, congrats you just saturated your I/O subsystem! Use this in conjunction with threads when trying to saturate an I/O path. If latency is already at unacceptable levels, reduce outstanding I/Os…but you’ll likely start to see a reduction in throughput. Try to find the sweet spot between minimum latency and maximum bandwidth. If each match the physical attributes of your disk subsystem you’re heading in the right direction. If outstanding I/Os is set to 1, the I/O is synchronous, all other values are asynchronous.…more on this later. 
     
  • File Size ( -c<size> ) – the file size, I like to have this be larger than the largest cache in the I/O pipeline. This includes your HBA, SAN controller…anything along the way between the running process and the disk. 
     
  • Block Size ( -b<size>[K|M|G|b] ) – the size of the IO, this is what we’ll change this to match varying I/O patterns in SQL Server.
     
  • Disable hardware write and software caching ( -h ) – we want to disable software (file system) caching and request disabling hardware caching. Disabling hardware caching is only a request of the storage hardware and that’s one of the reasons why want to ensure the file size we use is larger than the largest cache in our I/O path. This is enabled on all of the tests in this post. Further, for durability reasons most major relational database systems, SQL Server included, do not use the file system cache. They rely on their own caching mechanisms. 
     
  • IO Pattern – discussed in detail in our previous post here
    • Random I/O ( -r 
    • Sequential I/O ( -s ) – if using multiple threads, use -si this will coordinate the threads’ access into the file ensuring a sequential access pattern.
       
  • Write Percentage-w<percentage> ) – 0 is all reads, 100 is all writes.You can choose any value between, but I like to isolate read and write tests for analysis. 
     
  • Measure latency statistics ( -L ) – the whole reason we’re doing this is to understand our performance, go ahead and turn this on.

Impact of I/O Access Patterns

Here are some example Diskspd tests that implement sequential and random access patterns. These tests simulate index seeks/point queries and index scans/range queries.

  • Random

    diskspd.exe -d15 -o32 -t4 -b64K -h -r -L -w0 D:\TEST\iotest.dat

    This test will run for 15 seconds, with 32 outstanding IOs, using 4 threads, with 64k IOs. The hardware and software caches are disabled, access pattern is random and is read only. In our previous post we defined the characteristics of this access pattern, we should expect lower bandwidths and higher latencies in this test. This is due to the drives having to physically move to service the random I/O requests. This test is similar to an index seek/point query in SQL Server  (SSDs will still exhibit slightly higher latencies on random access as discussed in the last post here).

  • Sequential

    diskspd.exe -d15 -o32 -t4 -b64K -h -si -L -w0 D:\TEST\iotest.dat

    This test is the same as above, but uses a sequential access pattern. With sequential I/O we should see higher bandwidths with lower latencies. This is due to the data being physically contiguous on the drive. This test is similar to an index scan/range query in SQL Server.

Impact of I/O sizes

For these tests we’ll explore two I/O sizes. We’ll simulate a log buffer flush using a small 60KB synchronous, a small, single threaded, sequential write. Then we will simulate a backup operation with a much larger, multithreaded, sequential write.

  • Tranaction log simulation  

    diskspd.exe -d15 -o1 -t1 -b60K -h -s -L -w100 D:\TEST\iotest.dat

    In this test we simulate the writing of full transaction log records. The test is configured for synchronous I/Os by setting the outstanding I/O and threads to 1. Each I/O is 60KB and writes sequentially to the data file. We’re really trying to measure latency in the I/O subsystem and determine if there are any potential bottlenecks. 

  • Backup operation simulation

    diskspd.exe -d15 -o32 -t4 -b512K -h -si -L -w0 D:\TEST\iotest.dat 

    In this test we simulate the writing of a backup file. The test is configured for asynchronous, parallel I/Os by setting the outstanding I/O parameter to 32 and threads to 4. Each I/O is 512KB and reads sequentially from the file. We’re really trying to tax the I/O subsystem and reach a saturation point so we can really determine how much data our disk subsystem can move for reads. 

In this post we showed you how use Diskspd to craft tests to measure bandwidth and latency, two key attributes of your disk subsystem. In our next post in this series run some tests that simulate SQL Server I/O access patterns and review output. 

Encrypting Connections To SQL Server Using Certificates

Encrypting Connections To SQL Server Using Certificates

In this post we’re going to cover configuring a connection string in .NET applications for encrypting connections to SQL Server using certificates. The audience for this document is a developer that needs to configure encrypted connections from applications to a database server.

Encrypting connections with SQL Server using Certificates consists of two parts:

  • An appropriately configured connection string
  • A server certificate installed on the Database Engine (not covered in this post)

Configuring a Connection String

To configure a .NET connection string you will need to set the following parameters

  • Server – the fully qualified domain name (FQDN) of the SQL Server. The name here will need to exactly match the server common name or a subject alternative name configured in the certificate.
  • Database – the database context used for this connection
  • uid – the username of this connection
  • password – the password for the user
  • Encrypt – set to ‘yes’ to encrypt or ‘no’ or remove to disable encryption 
Optional
  • TrustServerCertificate – will bypass validating the certificate. This is useful if the certificate is untrusted, the common name or subject alternative name do not match what is in the certificate or the certificate is expired. This is useful for testing and troubleshooting. Enabling this in production environments is strongly discouraged as the destination certificate is not validated.

Configuring a Connection String On a Single Instance

When connecting to a single instance, a valid certificate must be installed on the instance. The server parameter in the connection string needs to be configured as the subject/common name in the certificate and the server configured in the connection string must match this name. In figure 1, the subject is sql14-a.lab.centinosystem.com the connection string server parameter should match this name.

Certificate

Figure 1 – Certificate with Subject/Common Name

Configuring a Connection String On Availability Groups

A valid certificate must be installed on all servers/replicas in the Availability Group. The subject/common name in the certificate should match the local server. The subject alternative names in the certificate should match the availability group listener DNS name. The server parameter of the connection string will use this name. In figure 2, the AG listener is ag1lst.lab.centinosystems.com, the connection string server parameter should match this name.

Subject Alternative Name

Figure 2 – Certificate with Subject Alternative Names

Configuring a Connection String With Aliases

Often application connection strings are configured using an alias (DNS CNAME), rather than the actual host name (A record) of the SQL Server. Upon the connection’s DNS request, the DNS server responds to the CNAME request for the alias with the A record data for the actual server.  This means the certificates common name or subject alternative name do not have to include the alias (DNS CNAME). This applies to single instances and Availability Groups.

Example Connection Strings

    1. Example connection to a single instance when the server has a valid certificate

      <connectionStrings><add name=myConnectionString connectionString=server=sql14-a.lab.centinosystems.com;database=TestDB1;uid=user1;password=s3cur31y;Encrypt=yes/></connectionStrings>

    2. Example connection to an availability group listener when each server has a valid certificate with the appropriate subject alternative name

      <connectionStrings><add name=myConnectionStringconnectionString=server=ag1lst.lab.centinosystems.com;database=TestDB1;uid=user1;password=s3cur31y;Encrypt=yes/></connectionStrings>

Errors

Here are a few errors that you may encounter when connecting to SQL Server with an encrypted connection. Basically when validation fails there will be an exception thrown with information similar to these examples.

When opening a connection string asking for encryption, but an untrusted certificate

System.Data.SqlClient.SqlException (0x80131904): A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 – The certificate chain was issued by an authority that is not trusted.) —> System.ComponentModel.Win32Exception (0x80004005): The certificate chain was issued by an authority that is not trusted

When opening a connection string, but the certificate is expired

System.Data.SqlClient.SqlException (0x80131904): A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 – The received certificate has expired.) —> System.ComponentModel.Win32Exception (0x80004005): The received certificate has expired

References:

Connection String Syntax – https://msdn.microsoft.com/en-us/library/ms254500(v=vs.110).aspx

Enable Encrypted Connections to the Database Engine – https://msdn.microsoft.com/en-us/library/ms191192(v=sql.110).aspx

Load Testing Your Storage Subsystem with Diskspd

One of the primary activities I do before bringing SQL Server into production is load testing the storage subsystem. On a new system this is critical because I want to ensure that we’re “getting what we’ve paid for” when it comes to the disk subsystem. All too often there’s a configuration issue, component mismatch, a fundamental misunderstanding of the technology or worse an insufficient disk subsystem…these all can lead to poor disk performance. Even if it’s the simplest test, its imperative to measure performance as it’s significantly harder to make changes to a SQL Server once a database is in production. So do your testing. This is especially an important topic if your disks are not direct attached or in a shared storage environment such as a SAN or VMware data store. Storage networks, controllers, shelves…it gets complicated fast!

In this article we’re going to discuss what we’re looking for when load testing your storage subsystem and introduce DiskSpd a tool for performance testing disks using varying IO patterns and sizes.

Key measurements

The two metrics of key interest to us in a disk subsystem are bandwidth and access latency. Bandwidth, often referred to as data transfer rate, is how much data can be moved in a time interval, think gigabytes per second. Access latency or access time is how long a disk transaction takes from request to the delivery of the requested data. Latency is measured in milliseconds for HDD and microseconds for solid state drives. I don’t get hung up on IOPs as much, as they’re really a function of latency. Keep your latency low and your IOPs will likely be high.

Access Patterns

The access pattern in which data is read from a drive can have significant implications on the bandwidth and latency of the request. There are two access patterns that we’re concerned with when load testing a system, sequential and random. Each with its own performance characteristics.

Sequential access is when an application requests a block of data and the next block requested is physically adjacent on the disk. On a hard disk drive (HDD) the drive’s head does not have to move and the disk platter simply rotates under the head to read the next block off the disk. On a solid state disk (SSD) this becomes less of a concern as SSDs access latency is constant across the drive. In SQL Server, sequential IO is analogous to table/index scans, translation log writes and database backups. When load testing a system we use sequential IO patterns to saturate the disk subsystems IO path and determine if there any physical bottlenecks limiting bandwidth between the application driving the IO and the device serving up the data…the disk drive, the storage network (interconnect) or the SAN.  

Hard Disk Drive

Figure 1: Hard Disk Drive – Image from Wikipedia

Random access is when an application requests a block of data and the next block requested may not physically adjacent on the disk. On an HDD the drive’s head may have to move and the disk’s platter rotate to read the next block off the disk. This all contributes to our access latency. Drive vendors refer to this as seek time in drive specifications. A solid state disk is composed of an array of flash memory chips, each of which have fixed time access latency. So on an SSD if an application requests a block, the request for the next block can be serviced in a fixed amount of time, regardless of its location on the disk (there is a small amount of latency when switching between flash packages). This means random IO patterns can be serviced more efficiently as we do not have to wait for a physically moving component to access the next block…and that’s the game changer for SSDs when compared with HDDs.

Further the access latency of an IO on an SSD is measured in microseconds, this is an order of magnitude faster than an HDD which is in milliseconds. In SQL Server, random IO patterns can occur on index seeks, data file writes and operations that read from the transaction log. When load testing a system we use random IO patterns to find the overall access latency to the disk subsystem and determine if there are any components in the system that are not servicing the requests “fast enough” or in other words contributing to access latency.

Variable IO Sizes

An application can request data in variable IO sizes. For example, SQL Server can perform IOs in 8KB, 64KB, 128KB, 256KB and more. The size of the IO impacts both latency and bandwidth. A small IO can have a lower access latency as it is measured from the request of the IO until the delivery of all of the data requested. A small IO potentially can consume less bandwidth as well, as each IO translates into physical disk accesses, each of which will have some access latency. So of key importance when measuring smaller IO sizes is a low access latency, if the disk subsystem can service the IO quickly enough then we can see higher bandwidths on smaller operations. But we’re really shooting for getting the IO completed as quickly as possible.

A larger IO request can have a higher access latency since the measurement is from the beginning of the request until it is finished. Simply put a larger IO will take longer to transfer because the IO is moving more data in one operation. The IO is still has to pay the cost of the initial access latency, but usually the dominant factor in the duration of the IO is the transfer. So of key importance when measuring larger IO sizes is higher bandwidth, but still keep an eye on latency.

Diskspd

Enter Diskspd, a tool from Microsoft which allows us to performance test a disk subsystem. Diskspd allows us to define specific IO patterns, IO sizes, file sizes, read or write access, number of threads, access stride, and many more options. One key feature of Diskspd is it reports access latency time in microseconds, which is becoming more important as SSDs are becoming more common in the enterprise. SQLIO has served us well for years, but Diskspd is really the next generation of testing.

In our next post we’ll discuss how to performance test a disk subsystem using Diskspd and look at some of the key values of bandwidth, latency, and likely IOPs too.