Category Archives: Engineering

Using Extended Events to Visualize Availability Group Replication Internals

SQL 2014 Service Pack 2 was recently released by Microsoft and there is a ton of great new features and enhancements in this release.This isn’t just a collection of bug fixes…there’s some serious value in this Service Pack. Check out the full list here. One of the key things added in this Service Pack is an enhancement of the Extended Events for AlwaysOn Availability Group replication.

Why are the new Availability Group Extended Event interesting?

If you’ve used Availability Groups in production systems with high transaction volumes you know that replication latency can impact your availability. If you want to brush up on that check out our blog posts on AG Replication Latency, Monitoring for replication latency, and issues with the DMVs when monitoring. These new extended events add insight at nearly every point inside your Availability Group’s replication. More importantly they also include duration. So using these Extended Events we can pinpoint latency inside our Availability Group replication.

Here’s the list and description of the new Extended Events:

  • hadr_log_block_group_commit – Log block group commit processing duration
  • log_block_pushed_to_logpool – Log block start push to log pool
  • log_flush_start – Asynchronous log write start
  • hadr_log_block_compression – Log block compression processing duration
  • hadr_capture_log_block – Primary has captured a log block
  • hadr_capture_filestream_wait
  • ucs_connection_send_msg – UCS transport connection sent message
  • hadr_log_block_send_complete – After a log block message has been sent. This event is only used for fail points
  • log_flush_complete – Reports that synchronous send is complete
  • hadr_receive_harden_lsn_message – When a new hardened LSN from the secondary
  • hadr_db_commit_mgr_harden – Transaction commit harden result from Hadron Commit management
  • hadr_transport_receive_log_block_message – Fires when we’re receiving new log block message
  • hadr_log_block_decompression – Log block decompression processing duration
  • hadr_apply_log_block – Secondary is going to append a log block to the log
  • hadr_send_harden_lsn_message – Crafting a message to send containing a new hardened LSN on a secondary.  Test only
  • hadr_lsn_send_complete – After an LSN message has been sent.
The source for this list and it’s descriptions is Amit Banerjee’s HADR Virtual Chapter Talk and KB3173156

Using Extended Events to visualize Availability Group Replication Internals

Using these Extended Events, we’re able to get some insight into the replication internals of our Availability Groups. Like I described a second ago, we can use these to measure the duration of each event and sift out performance issues in our replication. But we can also use them to get a better understanding of what goes into Availability Group replication, let’s see how.
Here’s the steps I used to generate the chart below:
  1. Let’s add an Event Session with all of these new Events above to the primary and secondary replicas of a two replica synchronous Availability Group
  2. Then with the sessions running, insert only one row on the primary replica. This will cause the AG to replicate the data change event and trigger the new Extended Events on both the primary and secondary replicas
  3. Stop the Event Sessions
  4. Open the Event file in SSMS and View Target Data on both the primary and secondary replicas
  5. Order the Event data by timestamp
  6. Merge the event data together based on the timestamp (I did this in a spreadsheet)
With this list we have the entire replication sequence of Events for that single insert replicating data from the primary to the secondary from the perspective of both the primary and secondary. I took that information and made the following graphic for you visualizing what it takes to replicate data in a synchronous Availability Group.

AG Synchrnous Replication Internals

 Figure 1: Availability Group Replication Extended Events

From Figure 1, let’s walk through the events…

On the primary

  • The first event that fires is hadr_log_block_group_commit starting things off. It’s my understanding that this initializes the needed code and program state for AG replication
  • Then up next is log_block_pushed_to_logpool. This fires when a log block is copied to the log pool. The log pool is a special memory location used to store log blocks that may need to be read again, since it’s in memory it keeps the readers from doing a physical IO on the transaction log
  • Then log_flush_start, this fires when log blocks are flushed to disk, normal write ahead logging
  • Once the log block has been copied into the log pool and flushed to disk, it’s read from the log pool and compressed then the hadr_log_block_compression event fires
  • After it’s compressed the AG “captures” the log block for replication to the secondary with hadr_capture_log_block 
  • Then builds and sends the message to the secondary and ucs_connection_send_msg fires
  • Once the message is sent the hadr_log_block_send_complete event fires
  • We mark the synchronous send complete with log_flush_complete

On the secondary

  • On the other side of the fence, basically the inverse occurs. We mark the receipt of the message with hadr_transport_receive_log_block_message
  • The log block is decompressed and hadr_log_block_decompression fires
  • Then the log block is appended to the local log buffer and hadr_apply_log_block fires
  • Then the log block is copied to the local log pool and hadr_pushed_to_logpool fires
  • Next, the asynchronous log write happens and log_flush_start marks that start of that
  • And once the asynchronous log write finishes, log_flush_complete fires
  • Once hardened locally we need to craft a message to send to the primary marking the the LSN is hardened and hadr_send_harden_lsn_message fires
  • ucs_connection_send_msg then transmits fires when the message is sent to the primary
  • hadr_lsn_send_complete marks the fires marking the end of sending the message

Back on the primary

  • hadr_receive_harden_lsn_message fires marking that the secondary has acknowledging receipt of the send LSN
  • And to wrap it all up hadr_db_commit_mgr_harden marks the end of the log block replication for this sequence

A few important notes…

  • Many of these Extended Events fire at different points and multiple times during replication under different conditions. There’s a field called mode, which is an integer value, that marks various conditions for the events. I have yet to decode what each value of mode is for all of the events but will update this blog as I get some insight.
  • It’s very important to note that this example highlights an insert into an Availability Group with a primary and only one secondary configured in synchronous Availability Mode. The code path and the subsequent Extended Events that fire can change based on Availability Mode (sync/async), replication health, and many other conditions that can impact replication.

I hope that I’ve shown you some interesting concepts that you can use to help troubleshoot your Availability Group replication using Extended Events. This post, highlights how you can use Extended Events to get some insight into the the internals of Availability Group replication.

If you have any questions about this or your Availability Group configuration or performance please feel free to email me at

Please follow me on Twitter @nocentino to keep up with new blog posts


Great references on the internals of the log pool –

HADR Virtual Chapter –

SQL Server, Persistent Memory on NVDIMMs and DAX

Paradigm Shift!

What do I mean by that? Every once in a while a technology comes along and changes the way things are done, moves the bar…well last week Microsoft released a Channel 9 video on persistent memory using NVDIMMs and DAX on Windows 2016…then combining it with SQL Server! This is one of those technologies that moves the bar! Check it out here.

Why is this important?

Relational databases like SQL Server use a transaction log to ensure the durability of the transactional operations to the database. This is so it can ensure its data is consistent in the event of a system failure. SQL Server uses a logging protocol called write ahead logging (WAL). This means that the data change operations must be written to stable, persistent storage before the process can change data in the database file…so this means our database throughput is at the mercy of the performance of the device the changes (log records) are written to.

Another facet of this is disk latency. Historically this has been terribly inefficient, especially in the case of spinning HDDs due to seek time and write latency. Disk latency has gotten much better recently with SSDs and NVMe. To combat disk latency, SQL Server doesn’t perform a disk write for each and every log record. SQL Server writes log records to what’s called a log buffer in main memory, once the log buffer is full it or a transaction commits (or aborts) it flushes that buffer to disk in sector aligned IO operations. This means your transactions have to wait for the disk IO to complete before they’re considered hardened to disk. This is a core bottleneck in RDBMS systems.

What changed?

Well, in NVDIMMs we have a DIMM, which is normally used for main memory, is now backed with a battery backed flash array. This means we can have access to stable storage at main memory speeds, think less than 10 microseconds, not milliseconds. An order of magnitude performance gain! NVDIMMs have two modes of operation, storage mode and memory mode. Storage mode will present the device as a block device to your OS. Memory mode will present this as a memory mapped device. Windows Server 2016 has the drivers to deal with both modes. In the first case you’ll get a block device and all the code that goes along with making a block IO request, the advantage of this mode is no application code changes, it’s just a block device like any other. In the second mode, memory mode, it presents byte addressable memory to the OS as storage, but you may need to change code to take full advantage of this. So, Microsoft has developed a feature called DirectAcces (DAX) in Windows Server 2016 to present memory mode storage to the OS bypassing traditional block IO code. This is where the insanity ensues! You’ll literally have stable storage available to your applications at main memory speeds, committing to stable storage is now just a memory copy!!! Watch the video for performance numbers!

How this will impact you?

Well, in the video Lindsey Allen highlights a code change in SQL Server that modifies how the log buffer writes data. Log writes are immediately considered hardened when committed to the log buffer. We no longer have to wait for the disk IO to complete, side stepping disk latency and IO altogether. This leads to a 2x improvement in the demo application in the video. These tests compare NVMe and DAX, and NVMe is PCI-express attached flash which is as good as it gets for persistent storage today (well until memory mode NVDIMMS take over). You’ll notice that CPU spikes during their load tests, this isn’t an uncommon issue when you remove your storage bottlenecks from your system.

What’s it going to cost?

Looking online, HPE has a 8GB NVDIMM available for $800 at online wholesalers, this is in DIMM form factor. Some are built directly into the motherboard, others are on cards.

Questions that I have?

  1. Foreign Memory Access – In a NUMA system is there be a performance penalty or inconsistent performance when threads need to read or write to a foreign memory area. Log writing is single threaded, so this should be something that we can manage. It does go multithreaded in 2016, so the best thing to do here is test your workload and measure.
  2. Hypervisor Support – can we leverage this awesomeness in a virtual machine?
  3. Error Detection – how to errors percolate back up into the OS/SQL so we know when something goes wrong. Likely using standard disk IO alerts will apply here.

Well hopefully I’ve convinced you how big of a deal this is and you’re as excited as I am!

Please let reach out to me if you like to know more about this –

CPU Scheduling Basics – Windows and SQL Server

In this post we’re going to introduce the basics of CPU scheduling.

In a computer system, only one thing can happen at a time. More specifically, only one task can be on a processor at a point in time. This can expand to several tasks if the system has multiple processors or a processor with multiple cores, which most modern systems have. For example, a four core system can potentially execute four tasks concurrently. 

So since only one task can be on a logical processor (or core) at one point in time, something has to manage which task gets access to the CPU. This is what the scheduler does. CPU scheduling is a very (VERY) deep topic and I’m going to introduce you to a couple basic concepts that can help you understand how scheduling works in Windows and SQL Server. So when you see things slow down in either of one of these technologies you know where you can start to look first.

If a task isn’t on a CPU, it’s not doing any processing. It’s simply waiting. Each CPU has its own task scheduler and the task scheduler is the thing that goes and assigns work to the logical CPU (or core).

Let’s check out a few examples of schedulers…

First up is Windows, it uses a preemptive, priority scheduler. This means processes with higher priority are allowed to access the CPU first, but also will stop a running process to schedule a higher priority process. Each process gets fixed amount of time on the processor, called a quantum, then Windows moves that process off the CPU and places into a queue waiting for access to the CPU again, so no one process dominates CPU time. 

SQL Server uses a non-preemptive, co-operative scheduler. Which also uses the quantum concept, but the task voluntarily yields and is switched off, placed into a queue waiting for access to the CPU again. Again, this ensures that no one task dominates the CPU, but here moving off the CPU is left up to the thread rather than the scheduler. In SQL Server a task is your work, that query you want to execute. A worker is the execution context of your work. A task is assigned to a worker, which is assigned to a scheduler for execution on the CPU. A scheduler is the manager of what gets on the CPU, there is one per logical processor.

Both Windows and SQL Server schedulers have a concept of process state and there are a couple of deeper nuances for each but here I’m going to generalize them into three states:

  • Running – a task executing on the CPU, an actively running process.
  • Waiting – a task is waiting on something before it can get back onto the CPU. For example, a process could be waiting on an I/O operating such as a read from a disk or network socket.
  • Runnable – a task waiting to get onto the CPU, nothing is preventing it from getting onto the CPU (no waiting)

You often hear CPU measured in percent CPU up to 100. But measuring CPU in terms of percent used is only part of the picture. I like to measure how long processes and tasks are waiting to get access to the CPU. Because what really we’re interested in, is how fast our work is progressing and it can only progress when it’s on a CPU doing work.

So here are a few basic ways to measure CPU pressure in terms of what’s waiting for access to a CPU

  • In Windows there’s a perfmon counter called processor queue length, anything in the queue is waiting to get onto the CPU. Each processor has its own queue. Check out a deeper dive here.
  • In SQL Server, we measure the number of tasks that are on the scheduler’s runnable queue, the lower the better. You can find that in the sys.dm_os_schedulers DMV in a column named runnable_task_count. Each CPU will have scheduler, and each will have a runnable task count. The query below will show you the runnable tasks, current tasks, and tasks waiting for a worker per scheduler. One row will be returned per scheduler.
select	  scheduler_id		--ID of the scheduler
	, cpu_id		--ID of the CPU
	, runnable_tasks_count	--a worker, with a task, waiting on CPU.
	, current_tasks_count	--total tasks associated with this scheduler.
	, work_queue_count	--total tasks waiting for a worker.
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255
ORDER BY work_queue_count DESC

I hope this helps you get a basic understanding of CPU scheduling in Windows and SQL Server and starts you on a journey of learning more about how it works!

Feel free to contact me to discuss more!

Did you like this post? If so, click here for to sign up for our newsletter to get more SQL goodness delivered right to your inbox!

Twitter @nocentino

Moving SQL Server data between filegroups – Part 1 – Database Structures

Why is moving data between filegroups hard?

As a consultant its common to walk into a customer site and find databases that are contained in one very large file. For various reasons it can be beneficial to adjust the number a data files for a database. See here. However, in SQL Server moving data from a one file database into a multi-file configuration is a non-trivial task. It’s a two step process, requiring that you add a new filegroup then in the filegroup add your multi-file configuration. Once you have that up, then we need to rebuild the indexes into that filegroup. This can be challenging if you have a lot of tables with a lot of indexes as SSMS allows you do move data but only for non-clustered indexes and only one at a time. Another issue is there are different techniques for moving different physical structures such as clustered indexes, heap and tables with LOB data.

In this post we’re going to introduce the some of the internal physical storage structures of a SQL Server database and describe how you can see what physical structures have been allocated where and to which objects. In part 2 of our series, we’ll introduce the concepts and techniques on how to move data between filegroups with Powershell.

SQL Server structures – where’s the data stored?

Let’s identify the SQL Server storage concepts that we will need to be familiar with for our filegroup migration process. This is simply a brief description, each of these concepts are very deep topics in their own right.

Physical Structures

  • Pages – an 8KB data structure that is the actual storage unit for data. 
  • Extents – 8 physically contiguous pages, the unit in which space is managed. 
  • Database files – the physical storage location of database data on the file system and disk.
  • Filegroup – a collection of databases files.
  • Database – the logical collection of tables that store data. Housed in a filegroup or collection of filegroups.
  • Clustered indexes – b-trees with data stored at the leaf level, the actual table data in key order.
  • Non-clustered indexes – b-trees with data stored at the level, a copy of table data in it’s own key order. Includes a pointer back to the key of clustered indexes or the RID of a heap.
  • Heaps – table data without a clustered index. A loose collection of pages.
  • LOB – a special page type used to store large objects and binary data.

Test Database Setup

For our exploration of physical storage we’re going to need a database to examine, let’s create a database [TestDB]

( NAME = N'TestDB', FILENAME = N'C:\DATA\TestDB.mdf' ,
( NAME = N'TestDB_log', FILENAME = N'C:\LOG\TestDB_log.ldf' ,

Test Table Setup

Time to create some tables, here are three. A table with clustered index, a heap and a table with a clustered index and some LOB data. For demonstration purposes I want to fill the data page with the maximum amount of data and to be cute we’ll construct a row that fills a page entirely. For full coverage check out “Anatomy of a data page” here.

The row will include:
  • Tag bytes – 4 bytes
  • integer – 4 bytes
  • char – 8000 bytes
  • char – 49 bytes
  • NULL bitmap 3 bits
This adds up to 8060KB, which is the maximum size of a data row in SQL Server.
  • Table with clustered Index
CREATE TABLE [dbo].[t1](
	[c1] [int] IDENTITY(1,1) NOT NULL,
	[c2] CHAR(8000) NOT NULL,
	[c3] CHAR(49) NOT NULL,
  • Table without a cluster index (heap)
CREATE TABLE [dbo].[t2](
	[c1] [int] IDENTITY(1,1) NOT NULL,
	[c2] CHAR(8000) NOT NULL,
	[c3] CHAR(49) NOT NULL
  • Table with clustered index and LOB data
CREATE TABLE [dbo].[t3](
	[c1] [int] IDENTITY(1,1) NOT NULL,
	[c2_lob] [VARCHAR](max) NOT NULL,	--max width, stored in data page
	[c3_lob] [VARCHAR](max) NOT NULL,	--max width, stored in text page
Insert some data
GO 16

GO 16

GO 16

Let’s check on where the data we inserted was stored

Using the query below, we can observe the number of pages and allocation unit types that were used to store our data for the three tables. We will use this data for verification of the data movement to the new filegroup.
SELECT  DB_NAME(database_id) AS [DatabaseName]
      , OBJECT_NAME(al.object_id) AS [Table]
      , AS [FG-Name]
      , allocation_unit_type_desc
      , COUNT(*) AS [Pages]
      , COUNT(*) * 8 AS [SizeKB]
FROM    sys.dm_db_database_page_allocations(DB_ID(), NULL, NULL, NULL,
                                            'DETAILED') AS al
        JOIN sys.database_files df ON al.extent_file_id = df.file_id
        JOIN sys.filegroups fg ON df.data_space_id = fg.data_space_id
        JOIN sys.indexes i ON al.object_id = i.object_id
                              AND al.index_id = i.index_id
WHERE   OBJECT_NAME(al.object_id) NOT LIKE 's%'
GROUP BY DB_NAME(database_id)
      , OBJECT_NAME(al.object_id)
      , allocation_unit_type_desc
In the result set you can see that the allocations for these tables are all out of the Primary filegoup and some are in row and some are LOB data based on our table definitions 
In next week’s post we will introduce the concepts and techniques needed to move this data into a new filegroup. If you absolutely cannot wait and would like the script shoot me an email and I’ll send it over to you.

Tracing call stacks in SQL Server – Installing the Debugging Tools for Windows (WinDbg) on Windows 8.1 and generating debug symbols for SQL Server binaries

Where to get the debugger tools

To generate the needed symbols you will need the “Windows Software Development Kit (SDK) for Windows 10” download here: and click “Download the standalone SDK”

  • I like to select the Download option and pick “Debugging Tools for Windows” this will download all of the installation files

How to install the debugger tools 

The installation of this software is very straight forward, Then find in the downloaded files \Windows Kits\10\StandaloneSDK\Installers\X64 Debuggers And Tools-x64_en-us.msi and install the debugger and install the tools with the default settings. 


Generating the symbol files for SQL Server

Once installed you will need to generate the symbols, below are two commands one for the SQL Server executable and one for all of the dlls that are in the SQL working directory. Paul Randal @PaulRandal recommends a tip in his blog for using *.dll to gather symbol data on all the dlls in the directory at the expense of time and a small amount of disk space.

  • Change into the BINN directory
    • cd C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Binn
  • “C:\Program Files\Windows Kits\10\Debuggers\x64\symchk.exe” sqlservr.exe /s SRV*c:\symbols*
  • “C:\Program Files\Windows Kits\10\Debuggers\x64\symchk.exe” *.dll  /s SRV*c:\symbols*

Please feel free to contact me with any questions regarding performance or other SQL Server related issues at:

Tracing call stacks in SQL Server – Introduction

At this fall’s SQLIntersection conference in Las Vegas I attended, Paul Randal’s (t|b)“Performance Troubleshooting Using Latches and Waits” precon. Where at he asked for some assistance compiling data for a project he’s working on. The project that would require installing the “Debugging Tools for Windows” and generating debug symbols for the SQL binaries. I have always intended to work with the debug symbols to find the call stack traces and experiment with what SQL Server does during certain events, like creating a database, inserting a row and such. These are topics that interest me as a computer scientist and a SQL Server professional and also can help our clients in understand conditions when trying to get a handle on obscure performance anomalies.

In this multi-part blog series I will document the process of

  • Installing the Debugging Tools for Windows (WinDbg) on Windows 8.1 and generating debug symbols for SQL Server binaries
  • Implementing the debug symbols on a SQL Instance
  • Demonstrate how to build a call stack trace
  • Document the stack traces generated during interesting scenarios such as creating a database, adding a table, inserting a row and more

What is a symbol file?

A symbol file is used by a program, usually a debugger, at runtime to translate the address offsets of an machine binary to human readable names from the programming constructs such as function calls.

What is a call stack?

As each function is called, it’s address offset (i.e. program counter) is pushed onto the stack in a stack frame. The currently executing function is on the top of the stack. Subsequent function calls are pushed onto the top of the stack and the calling function is pushed down in the stack. This is the call stack. Each entry will have an address offset of the function call in the binary executable. In the execution of a program a function will call another function and so on. This call stack can hint at what is happening inside the execution of the program. 

What do we care about the call stacks when gathering wait and latch data for SQL Server?

This technique allows us identity code execution paths, waits, and latches from inside the SQL Server’s executing process. From this we can have greater insight as to under which conditions these events occur.

What does a call stack in SQL Server look like?


Next up in this series is “Installing the Debugging Tools for Windows (WinDbg) on Windows and generating debug symbols for SQL Server binaries

Please feel free to contact me with any questions regarding performance or other SQL Server related issues at:

References used for this project:

Book Review – Query Tuning & Optimization

Via Twitter in October Benjamin Nevarez @BenjaminNevarez solicited help for book reviewers for his new book Microsoft SQL Server 2014 Query Tuning & Optimization I jumped at this opportunity as Benjamin is a great author with significant SQL Server experience and this new book further solidifies his place in the community.

Query processing and its inner workings are extremely dense technical concepts that Benjamin is able to present to the reader in a very approachable readable manner. The book elegantly walks you through the major components of the query processor and dives deep where needed.

The code examples in the book are concise and accurately reenforce the key points the author is trying to convey in the text. 

The book excels at connecting very complicated query processing topics together for the reader in a simple and approachable manner.

It covers the many new features of SQL 2014, including In-Memory OLTP, incremental statistics, the new cardinality estimator and the new features added to columstore indexes.

There are few shortcomings with this book, I would have preferred citations throughout the book. Key concepts are introduced without citation and I would like to jump into some topics, specifically into the materials the author used when writing. The book does have an exceptional reference section, but there is no connection back to the text.

This book is now an a permanent fixture in my library of SQL Server references, I will likely need to replace it when I wear it out.

If you need help with query tuning or other performance issues with your SQL Server please contact me at:

Query Tuning & Optimization

Here are some of the (unedited) notes that I took while reading the book, covers the first six chapters:

Chapter 1

Solid intro. Good examples specifically on the warnings sections page 24 with references to upcoming chapters on how to deal with them. Gets off the ground with the appropriate groundwork/fundamentals for QP architecture and reading query plans and to move onto the subsequent chapters. 

Chapter 2
The chapter begins with the building blocks of query troubleshooting how to find resource utilization exec_requests and exec_session and also plan hash and text hash for when you need to hunt down your code or plan. Page 51 CPU queries are great splitting on plan hash so each batch is in results. The chapter then moves from the building blocks of hashes and text to tell the reader which tools are available to find troublesome queries on your systems. 

Chapter 3
Walks us through the query optimization process with great detail and contrete examples. Toes the line between academic and practical. Disceting a QP at this level will give you great insight into the inner workings of sql server and allow you to create better more well performing queries.  Having references to academic and professional papers is a great resource.  But I would have loved to have had reference numbers so I could go off on a deep dive when needed. This chapter clears the air and busts some myths that are found on the web such as join order at the query level matters. Also gives you the ability to dive deep on your own questions so you can runoff and experiment. 

Chapter 4
The transition into this chapter connects the previous chapters.  There is a solid example of this on page 130-131. Ben quickly connects the ideas of operations to indexes and their potential efficacy/efficiency. When discussing tipping point between bookmark lookup and scan, the reader is left wondering why.  Lack of explanation. Everything else so far is covered in such great detail but this wasn’t. In the parallel limitations section page 156, lacks any insight as to why these features will not produce a plan. 

Chapter 6
Avoid the use of local variables in a where clause. Uses a 30 percent guess. Use parameters or literals. The most concise explanation of the histogram yet. Wanted deeper discussion on stats skew.