Monthly Archives: April 2015

Moving SQL Server data between filegroups – Part 2 – The implementation

In this post we are going to show the implementation of a PowerShell script using SMO to move data between filegroups on SQL Server. This article is the second of our two part series on “Moving SQL Server data between filegroups – Database Structures”, you can find the first article here.

The Challenge

Looking around on the web, I couldn’t find a solution to the problem of moving data between filegroups that I liked. Further, many of those solutions are T-SQL based, which I thought were very complex. So I went off to write it myself. The problem lends itself to an iterative solution and I felt that T-SQL was not the right tool for the job. Enter PowerShell, which give us the ability to easily iterate over sets of data with minimal code, couple that with the SQL Server Management Object model and we have the makings of an elegant solution.

SQL Server Management Objects (SMO)

The SQL Server Management Objects (SMO) are a collection of objects developed for programming solutions for the management of SQL Server. SMO allows us to represent the state of the SQL Server configuration quickly and accurately by instantiating an object that represents the particular attribute of SQL Server that we are interested in. For example, at the core of the script below, we easily make a collection of indexes and iterate over that collection.

The code snippet below shows how to build an object array of indexes to move to the new filegroup. We get a list of all the tables, then iterate over all of the indexes on that table, if the index is clustered or non-clustered it’s added to an object array. If the table is a heap, it’s added to a second object array. 

$tables = $db.Tables | Where-Object {$_.Name -like $tablesToMove -and $_.Schema -like $schemaToMove -and $_.IsPartitioned -eq $FALSE}

$indexesToMove = @()
$heapsToMove = @()

#build a list of tables to be moved
foreach( $table in $tables )
    #get a list of all indexes on this table
    $indexes = $table.Indexes

    #iterate over the set of indexes
    foreach( $index in $indexes )
	#if this table is a clustered or Non-Clustered index. 
	#Ignore special index types.
       if ( $index.IndexType -eq "ClusteredIndex" -or 
			$index.IndexType -eq "NonClusteredIndex" ) 
			#if this index is not already in the destination FG, 
			#add the index to the array
           if ( $index.FileGroup -ne $fileGroup )
             Write-Output( $table.Schema + '.' + $table.Name + " " + $index.Name)
             $indexesToMove += $index
    if ($table.HasClusteredIndex -eq $FALSE)
        Write-Output( $table.Schema + '.' + $table.Name + " as a heap")
        $heapsToMove += $table

Implementing this code in a T-SQL based solution would involve complex looping constructs and deep knowledge of how SQL Server stores the metadata about tables and indexes. Using SMO, we’re able to leverage the object model and hide the implementation details of the SQL internals and build a simple code block to accomplish our task. This concept is extended further in that SMO functionality can change as SQL Server changes. With the two arrays from the code above we drive another block of code that actually moves the index or heaps to the destination filegroup.

The disclaimer

Listen, no really…lean in closer to your monitor and listen…this script moves data around in your database, it can generate a ton of IO and adds and deletes things. Test this script on a non-production system. Please do not blindly run this code on any system, in fact read the code first so you know exactly what it does…you’re going to have to because I left a safety net in the code.

The script

OK enough background information, let’s get to the meat of the topic, the script.  By itself it is pretty straightforward, set some variables in the text file, then run the script in Powershell. It will ask you for a destination filegroup so make sure that exists first. Here is it: MoveIndexes_1.ps1.txt

Here is a description of the variables in the script.

  • $server – the server to connect to
  • $dbName – the database in which we want to move data
  • $doWork – a boolean variable used as a safety net. Set to true to move data, set to false to print out the objects that will be moved
  • $onlineOpt – a boolean variable used if we want an online index rebuild on Enterprise edition
  • $tablesToMove – a string used to select which tables to move. It uses a trailing wildcard * for example r* will move all tables beginning with r. Leave it set as * to move all tables. 
  • $schemaToMove – a string used to select which schemas to move. It uses a trailing wildcard * for example dbo* will move all schemas beginning with dbo. Leave it set as * to move all schemas.

Example Execution

To get started, we will need to add an new file group and add some files to the filegroup.

  • Add a new filegroup
USE [master];
  • Add files to the new filegroup
ADD FILE (	NAME = N'TestDB1', FILENAME = N'C:\DATA\testdb1.ndf' , 
			SIZE = 10240KB , FILEGROWTH = 10240KB )

ADD FILE ( 	NAME = N'TestDB2', FILENAME = N'C:\DATA\testdb2.ndf' , 
			SIZE = 10240KB , FILEGROWTH = 10240KB )

ADD FILE (	NAME = N'TestDB3', FILENAME = N'C:\DATA\testdb3.ndf' , 
			SIZE = 10240KB , FILEGROWTH = 10240KB )

ADD FILE (	NAME = N'TestDB4', FILENAME = N'C:\DATA\testdb4.ndf' , 
			SIZE = 10240KB , FILEGROWTH = 10240KB )
  • Example output of the powershell script
PS C:\Windows\system32> .\MoveIndexes_1.ps1
Please enter the destination Filegroup: FG1

Database: TestDB
dbo.t1 t1_cl_c1
dbo.t2 as a heap 
dbo.t3 t3_cl_c1

Are you sure you want to move the 3 objects listed above to the destination filegroup? (y/n): y

Moving: t1_cl_c1
Moving: t3_cl_c1
Moving Heap: t2

Filegroup contents

Parent		ID FileName          Name 	SizeUsedSpace
———		-- --------          ---- 	---- ---------
[PRIMARY]	1 C:\DATA\TestDB.mdf TestDB 	10240   2688
[FG1]		3 C:\DATA\testdb1... TestDB1	10240   256
[FG1]		4 C:\DATA\testdb2... TestDB2	10240   192
[FG1]		5 C:\DATA\testdb3... TestDB3	10240   192
[FG1]		6 C:\DATA\testdb4... TestDB4	10240   256


Parent		Schema	Name	FileGroup           
------		------	----	---------           
[TestDB]	dbo	t1	FG1             
[TestDB]	dbo	t2	FG1             
[TestDB]	dbo	t3	FG1             

Key observations

Looking at the output below here are a few key observations 

  • You can see the redistribution of pages across the files in filegroup FG1 when compared with the initial setup from the first blog post here
  • [t1], the table with the clustered index and [t2], the heap, are completely contained within the new filegroup FG1
  • [t3], the table with the LOB data is split across PRIMARY and FG1. The LOB data remained in PRIMARY the IN_ROW_DATA is moved to FG1, more on that in a second.

The allocations will not be perfectly even amongst the files in the filegroup for a particular object. Data files in a filegroup use a proportional fill algorithm, discussed on MSDN here and at SQLskills here. As you can see, there is a straggler in the PRIMARY filegroup, the LOB data. Moving LOB data will require copying the data to a new table that is in the destination filegroup, renaming it and dropping the old that and I’m certainly not going to script something that dangerous for you :) 

Page Allocation Filegroup

Future enhancements and Notes

  • LOB data – I intentionally left this unimplemented, but given the power of SMO, we could certainly develop a solution to this problem. Of particular concern would be indexes, foreign key constraints and other table settings that would need to be copied to the destination table.
  • Partitions – I plan to make the script partition aware, but left this out of the current implementation due to time constraints.
  • The script was developed and tested on SQL 2014, I do know that the script does not work on 2005 as CREATE INDEX WITH DROP_EXISTING did not exist yet. This is how the Recreate() method implements the index move under the hood.  


I hope this script proves helpful to you in several ways

  • Highlights the potential of using SMO and PowerShell to simplify complex operations and tasks
  • Provides a starting point for you to extend the code for your particular filegroup migration situation
Please feel free to contact me if you need some help with the script or have suggestions on its development.

Twitter @nocentino


Moving SQL Server data between filegroups – Part 1 – Database Structures

Why is moving data between filegroups hard?

As a consultant its common to walk into a customer site and find databases that are contained in one very large file. For various reasons it can be beneficial to adjust the number a data files for a database. See here. However, in SQL Server moving data from a one file database into a multi-file configuration is a non-trivial task. It’s a two step process, requiring that you add a new filegroup then in the filegroup add your multi-file configuration. Once you have that up, then we need to rebuild the indexes into that filegroup. This can be challenging if you have a lot of tables with a lot of indexes as SSMS allows you do move data but only for non-clustered indexes and only one at a time. Another issue is there are different techniques for moving different physical structures such as clustered indexes, heap and tables with LOB data.

In this post we’re going to introduce the some of the internal physical storage structures of a SQL Server database and describe how you can see what physical structures have been allocated where and to which objects. In part 2 of our series, we’ll introduce the concepts and techniques on how to move data between filegroups with Powershell.

SQL Server structures – where’s the data stored?

Let’s identify the SQL Server storage concepts that we will need to be familiar with for our filegroup migration process. This is simply a brief description, each of these concepts are very deep topics in their own right.

Physical Structures

  • Pages – an 8KB data structure that is the actual storage unit for data. 
  • Extents – 8 physically contiguous pages, the unit in which space is managed. 
  • Database files – the physical storage location of database data on the file system and disk.
  • Filegroup – a collection of databases files.
  • Database – the logical collection of tables that store data. Housed in a filegroup or collection of filegroups.
  • Clustered indexes – b-trees with data stored at the leaf level, the actual table data in key order.
  • Non-clustered indexes – b-trees with data stored at the level, a copy of table data in it’s own key order. Includes a pointer back to the key of clustered indexes or the RID of a heap.
  • Heaps – table data without a clustered index. A loose collection of pages.
  • LOB – a special page type used to store large objects and binary data.

Test Database Setup

For our exploration of physical storage we’re going to need a database to examine, let’s create a database [TestDB]

( NAME = N'TestDB', FILENAME = N'C:\DATA\TestDB.mdf' ,
( NAME = N'TestDB_log', FILENAME = N'C:\LOG\TestDB_log.ldf' ,

Test Table Setup

Time to create some tables, here are three. A table with clustered index, a heap and a table with a clustered index and some LOB data. For demonstration purposes I want to fill the data page with the maximum amount of data and to be cute we’ll construct a row that fills a page entirely. For full coverage check out “Anatomy of a data page” here.

The row will include:
  • Tag bytes – 4 bytes
  • integer – 4 bytes
  • char – 8000 bytes
  • char – 49 bytes
  • NULL bitmap 3 bits
This adds up to 8060KB, which is the maximum size of a data row in SQL Server.
  • Table with clustered Index
CREATE TABLE [dbo].[t1](
	[c1] [int] IDENTITY(1,1) NOT NULL,
	[c2] CHAR(8000) NOT NULL,
	[c3] CHAR(49) NOT NULL,
  • Table without a cluster index (heap)
CREATE TABLE [dbo].[t2](
	[c1] [int] IDENTITY(1,1) NOT NULL,
	[c2] CHAR(8000) NOT NULL,
	[c3] CHAR(49) NOT NULL
  • Table with clustered index and LOB data
CREATE TABLE [dbo].[t3](
	[c1] [int] IDENTITY(1,1) NOT NULL,
	[c2_lob] [VARCHAR](max) NOT NULL,	--max width, stored in data page
	[c3_lob] [VARCHAR](max) NOT NULL,	--max width, stored in text page
Insert some data
GO 16

GO 16

GO 16

Let’s check on where the data we inserted was stored

Using the query below, we can observe the number of pages and allocation unit types that were used to store our data for the three tables. We will use this data for verification of the data movement to the new filegroup.
SELECT  DB_NAME(database_id) AS [DatabaseName]
      , OBJECT_NAME(al.object_id) AS [Table]
      , AS [FG-Name]
      , allocation_unit_type_desc
      , COUNT(*) AS [Pages]
      , COUNT(*) * 8 AS [SizeKB]
FROM    sys.dm_db_database_page_allocations(DB_ID(), NULL, NULL, NULL,
                                            'DETAILED') AS al
        JOIN sys.database_files df ON al.extent_file_id = df.file_id
        JOIN sys.filegroups fg ON df.data_space_id = fg.data_space_id
        JOIN sys.indexes i ON al.object_id = i.object_id
                              AND al.index_id = i.index_id
WHERE   OBJECT_NAME(al.object_id) NOT LIKE 's%'
GROUP BY DB_NAME(database_id)
      , OBJECT_NAME(al.object_id)
      , allocation_unit_type_desc
In the result set you can see that the allocations for these tables are all out of the Primary filegoup and some are in row and some are LOB data based on our table definitions 
In next week’s post we will introduce the concepts and techniques needed to move this data into a new filegroup. If you absolutely cannot wait and would like the script shoot me an email and I’ll send it over to you.

Designing for offloaded log backups in AlwaysOn Availability Groups – Part 2 – Monitoring

AlwaysOn Availability Groups have made a big splash in the SQL world and are quickly becoming the HA and DR technology of choice for many SQL Server environments. Crucial to their success is the ability to move data between the replicas in the Availability Group quickly and efficiently. In the previous post we discussed design concepts for offloaded backups in AlwaysOn Availability Groups, specifically we focused on how data is moved between AGs and the potential impact on backups and recovery. It is important to measure and trend replication health and this article introduces techniques and queries that you can use in your environment to measure and trend replication health and some of the nuances of the data reported in DMVs.

Measuring Availability Group Replication Latency

Central to measuring replication health is the sys.dm_hadr_database_replica_states DMV. On the primary replica this DMV returns rows representing the current state for each database and it’s replicas participating in AvailabilityGroups. 

The key fields we’re going to focus on for our monitoring are:

  • log_send_queue_size – the amount of log records not sent to a secondary
  • redo_queue_size – the amount of log records not yet redone on the secondary
  • last_commit_time – the time of the last committed log record on a replica
  • last_redo_time – the time of the last log record was redone on a replica

Measuring AG Latency When All Replicas Are Online

We want to measure replication latency as this directly impacts the recovery point of databases in the event of a failure. Under normal operations, when all replicas are online, monitoring the from the perspective of the primary can provide insight to our replication workload for trending and analysis. Doing so will help identify if there is a need for changes in things like WAN capacity, dedicated network connections for replication or reducing log generation on the primary.

Here is an example of an AG with each node online and a very small log_send_queue_size and redo_queue_size. All of the examples in this article are executed on the primary (SQL14-A). So in this result set log_send_queue_size and redo_queue_size values are NULL for the primary replica, since the primary isn’t replicating to itself, and reported values for the secondary replicas.

 SELECT r.replica_server_name
      , DB_NAME(rs.database_id) AS [DatabaseName]
      , rs.log_send_queue_size
      , rs.log_send_rate
      , rs.redo_queue_size
      , rs.redo_rate
FROM   sys.dm_hadr_database_replica_states rs
        JOIN sys.availability_replicas r ON r.group_id = rs.group_id
             AND r.replica_id = rs.replica_id
 WHERE  DB_NAME(rs.database_id) = 'TestAG1'
 ORDER BY r.replica_server_name;

Availability Group Latency

Measuring AG Latency When A Replica Is Offline

The information in sys.dm_hadr_database_replica_states shows the current state known to the server executing the DMV query, so it is possible for information to be stale as other replicas may not be reporting due to an inability to communicate with the primary. For example, When a replica is disconnected from the primary, the log_send_queue_size field for a database on that replica changes to NULL and the remaining values are no longer updated.

In this example we use the same query as the above, we simulated a network outage which caused SQL14-C to be disconnected from the primary and it’s log_send_queue_size is now reporting NULL and log_send_rate, redo_queue_size and redo_rate are the last reported values from the secondary. 

Availability Group Latency 

NULL Isn’t A Good Value To Trend or Report

In the event of a system failure we need information to report to the business about the state of the system. With a secondary is offline and the log_send_queue_size reporting NULL, it’s hard to gauge the amount of data loss. We need to rely on another attribute when querying the DMV for health of replication. We can use last_commit_time which indicates the time of the last committed record in the transaction log and retains the last communicated value from the secondaries rather that switching to NULL. There are several other *_time and *_LSN counters in the DMV which are updated with the time of the most recent interaction rather than reflecting what is the current state of replication health, more on this in a second.

In this example, we use a slightly different query to calculate the difference between the last_commit_time values on secondary replicas and the primary replica, this gives us the amount of time in seconds the secondaries are behind the primary. Keep in mind, this is the commit time in the transaction log, so if there aren’t any transactions for a period of time before your replicas are disconnected that will be included in this measurement. The result set shows a secondary that has been disconnected for several minutes.

 SELECT  r.replica_server_name
      , DB_NAME(rs.database_id) AS [DatabaseName]
      , ISNULL(DATEDIFF(SECOND, rs.last_commit_time, prs.last_commit_time), 0) AS [SecsBehindPrimary]
      , prs.last_commit_time AS [Primary_last_commit_time]
      , rs.last_commit_time AS [Secondary_last_commit_time]
      , rs.last_redone_time AS [Secondary_last_redone_time]
FROM    sys.dm_hadr_database_replica_states rs
        JOIN sys.availability_replicas r ON r.group_id = rs.group_id
              AND r.replica_id = rs.replica_id
        JOIN sys.dm_hadr_database_replica_states prs ON r.group_id = prs.group_id
              AND prs.group_database_id = rs.group_database_id
              AND rs.is_local = 0
              AND prs.is_primary_replica = 1
WHERE   DB_NAME(rs.database_id) = 'TestAG1'
ORDER BY r.replica_server_name;

Screen Shot 2015 04 13 at 7 21 58 PM

Measuring AG latency When A Replica Is Reconnected

When that replica is reconnected the current log_send_queue_size value is reported and the *_time and *_LSN counters in the DMV are updated with the time of the most recent interaction not necessarily the time of the change recorded in the log record being sent or redone.  For example, last_redone_time is the time of the last redone log record, not the time of the data change in the log record being redone. 

In this example, we use the first query from above, SQL14-C is reconnected to the primary and log_send_queue_size is reporting 232,822KB of log in the log_send_queue_size and 15,052KB in the redo_queue_size. Recall that this is the amount of log that must be sent to and redone on the secondaries.

Availability Group Latency


We constructed a test that included two Availability Group replicas on different subnets. We generated an insert heavy workload which caused a measurable replication workload and added 50ms latency on the network router using tc. During the test, we disconnected the secondary replica from the network, which causes the log_send_queue_size to build on the primary. We then reconnected the replica the log_send_queue_size is reported properly and the redo_queue builds on the secondary.

The chart below shows the data movement between the queues. The chart samples start just before the secondary is reconnected, upon connection the log_send_queue_size reports it’s current value and begins working it’s way down. You can also see the redo_queue_size fluctuate based on the progress of the redo thread on the secondary replica.  While this is a contrived example, with a trivial workload, in practice this can be a protracted event if there is an excessive amount of change during the time the replicas are disconnected a potentially RPO impacting event.

Availability Group Latency Chart

What to do next

  • As you design and operate your system, measure replication latency log_send_queue_size and redo_queue_size when the replicas are online and healthy for trending and analysis
  • In the event of an outage be sure to know how far your replicas are behind, this is a critical piece of information for the business
  • As discussed in the first article, develop operations that better support your design, perhaps patching and heavy database maintenance do not happen at the same time
  • Knowing this information and help you better meet the recovery point objective for your organization
If you need help with your Availability Groups or other parts of your SQL Server environment, please feel free to contact me 

Twitter @nocentino

Designing for offloaded log backups in AlwaysOn Availability Groups – Part 1

AlwaysOn Availability Groups made their initial appearance in SQL 2012 and have generated a lot of buzz, HA and DR in one! Even with AGs, still integral to your DR strategy are backups and with AGs you’re given the option to offload backups to a secondary replica. In this blog we’re going to talk about offloaded log backups the potential impact to your databases’ recoverability under certain conditions, we’ll begin with some preliminaries on data movement in AGs.

How data is moved in Availability Groups

Roughly speaking data is synchronized between replicas in an Availability Group by sending log blocks from the transaction log of the primary (read/write) replica over a database mirroring endpoint to the secondary replicas. However, log blocks are not immediately sent, they are passed through a set of queues.

  • Send queue – a queue used to store log records that will be sent from the primary to the secondary replica 
  • Redo queue – a queue used to store log records received on the secondary, that have to be “played back” on the secondary replica

The send queue’s impact on recovery point

In the event that log generation on the primary exceeds the rate in which log blocks are sent to secondaries or a secondary becomes unavailable, the log blocks can build up in the send queue. If there is unsent data in send queue, then this is the amount of data at risk in the event of a disastrous failure of the primary replica.

You might be thinking to yourself, I configured the AG in synchronous availability mode, I should be safe. Even in synchronous mode, the send queue can build up. A secondary replica in an Availability Group can become unavailable for any number of reasons, patching, network outage…etc and in this situation the data destined for the secondary replica(s) is stored in the send queue. If the secondary comes back online then the primary will work diligently to send the data to the secondary.

The redo queue’s impact on recovery point

In the event that log blocks received by a secondary exceed the rate in which the records can be processed by the redo thread, the redo queue will grow. This can happen when the secondary cannot simply keep up with the rate of change on the primary or during an outage of a secondary. A practical example of the latter is when a secondary comes back online after an outage and there is a large amount of data change during that outage. Think of the times when database maintenance is running on a weekend and the network team just happens to be updating switch firmware. All of that change is queued in the send queue on the primary and when the secondary is back online, quickly shipped over to the redo queue on the secondary.

Now, with that huge chunk of data change hardened on the secondary and in the redo queue, one would think we’re in the clear. Well, sort of, yes your data is in two places but now the offloaded transaction log backups on your secondary may start to fail if the secondary replica is too far behind the primary. Specifically, in the event that the database’s last backup LSN (log record beyond the end of the log backup) on the primary is greater than the local redo LSN on the secondary your backups will fail. This is a protection to prevent gaps in the continuity of the transaction log in backups and a condition that you need to be aware of when designing offloaded backups where the database is unable to take a successful log backups and impacting RPO.

Availability Group LSN log

Figure A: last LSN and redo LSN positions in the transaction log

Designing for offloaded backups in AlwaysOn AvailabilityGroups 

Availability Groups allow us to offload backups to secondary replicas. This is a great option as it reduces IO on the primary replica. In doing so, system designers need to be aware of the impact the health of the AG replication has on off loaded backups. Understanding when your data is at risk during times where the secondaries are not completely caught up with the changes from the primary and techniques to mitigate that risk and protect the RPO and RTO that the business expects

Here are a few things to keep in mind:

  • Understand and minimize the amount of log generation that occurs in your databases and design to support that load
  • Monitor send_queue and redo_queue in sys.dm_hadr_database_replica_states on replicas to measure impact on recovery point objectives 
  • Understand your system’s operations, consider downtime for patching and network maintenance
  • Understand resource contention on shared infrastructure, are you competing for things like network bandwidth, disk IO?
In the next article we’ll explore some DMVs that will allow you to get insight into AG replication health and in the final post of this series show how to use RedGate’s SQLMonitor custom metrics for trending and analysis.

If you need help with your Availability Groups or other parts of your SQL Server environment, please feel free to contact me

Twitter @nocentino