Moving SQL Server data between filegroups – Part 1 – Database Structures

Why is moving data between filegroups hard?

As a consultant its common to walk into a customer site and find databases that are contained in one very large file. For various reasons it can be beneficial to adjust the number a data files for a database. See here. However, in SQL Server moving data from a one file database into a multi-file configuration is a non-trivial task. It’s a two step process, requiring that you add a new filegroup then in the filegroup add your multi-file configuration. Once you have that up, then we need to rebuild the indexes into that filegroup. This can be challenging if you have a lot of tables with a lot of indexes as SSMS allows you do move data but only for non-clustered indexes and only one at a time. Another issue is there are different techniques for moving different physical structures such as clustered indexes, heap and tables with LOB data.

In this post we’re going to introduce the some of the internal physical storage structures of a SQL Server database and describe how you can see what physical structures have been allocated where and to which objects. In part 2 of our series, we’ll introduce the concepts and techniques on how to move data between filegroups with Powershell.

SQL Server structures – where’s the data stored?

Let’s identify the SQL Server storage concepts that we will need to be familiar with for our filegroup migration process. This is simply a brief description, each of these concepts are very deep topics in their own right.

Physical Structures

  • Pages – an 8KB data structure that is the actual storage unit for data. 
  • Extents – 8 physically contiguous pages, the unit in which space is managed. 
  • Database files – the physical storage location of database data on the file system and disk.
  • Filegroup – a collection of databases files.
  • Database – the logical collection of tables that store data. Housed in a filegroup or collection of filegroups.
  • Clustered indexes – b-trees with data stored at the leaf level, the actual table data in key order.
  • Non-clustered indexes – b-trees with data stored at the level, a copy of table data in it’s own key order. Includes a pointer back to the key of clustered indexes or the RID of a heap.
  • Heaps – table data without a clustered index. A loose collection of pages.
  • LOB – a special page type used to store large objects and binary data.

Test Database Setup

For our exploration of physical storage we’re going to need a database to examine, let’s create a database [TestDB]

CREATE DATABASE [TestDB]
 ON  PRIMARY
( NAME = N'TestDB', FILENAME = N'C:\DATA\TestDB.mdf' ,
	SIZE = 10240KB , MAXSIZE = UNLIMITED, FILEGROWTH = 10240KB )
 LOG ON
( NAME = N'TestDB_log', FILENAME = N'C:\LOG\TestDB_log.ldf' ,
	SIZE = 10240KB , MAXSIZE = UNLIMITED , FILEGROWTH = 10240KB ) 

Test Table Setup

Time to create some tables, here are three. A table with clustered index, a heap and a table with a clustered index and some LOB data. For demonstration purposes I want to fill the data page with the maximum amount of data and to be cute we’ll construct a row that fills a page entirely. For full coverage check out “Anatomy of a data page” here.

The row will include:
  • Tag bytes – 4 bytes
  • integer – 4 bytes
  • char – 8000 bytes
  • char – 49 bytes
  • NULL bitmap 3 bits
This adds up to 8060KB, which is the maximum size of a data row in SQL Server.
 
  • Table with clustered Index
CREATE TABLE [dbo].[t1](
	[c1] [int] IDENTITY(1,1) NOT NULL,
	[c2] CHAR(8000) NOT NULL,
	[c3] CHAR(49) NOT NULL,
 	CONSTRAINT [t1_cl_c1] PRIMARY KEY CLUSTERED ( [c1] ASC )
)
  • Table without a cluster index (heap)
CREATE TABLE [dbo].[t2](
	[c1] [int] IDENTITY(1,1) NOT NULL,
	[c2] CHAR(8000) NOT NULL,
	[c3] CHAR(49) NOT NULL
)
  • Table with clustered index and LOB data
CREATE TABLE [dbo].[t3](
	[c1] [int] IDENTITY(1,1) NOT NULL,
	[c2_lob] [VARCHAR](max) NOT NULL,	--max width, stored in data page
	[c3_lob] [VARCHAR](max) NOT NULL,	--max width, stored in text page
	CONSTRAINT [t3_cl_c1] PRIMARY KEY CLUSTERED ( [c1] ASC )
)
Insert some data
INSERT INTO [t1] VALUES (REPLICATE('A', 8000), REPLICATE('B', 49))
GO 16

INSERT INTO [t2] VALUES (REPLICATE('C', 8000), REPLICATE('D', 49))
GO 16

INSERT INTO [t3] VALUES (REPLICATE('C', 8000), REPLICATE('D', 16000))
GO 16

Let’s check on where the data we inserted was stored

Using the query below, we can observe the number of pages and allocation unit types that were used to store our data for the three tables. We will use this data for verification of the data movement to the new filegroup.
SELECT  DB_NAME(database_id) AS [DatabaseName]
      , OBJECT_NAME(al.object_id) AS [Table]
      , fg.name AS [FG-Name]
      , df.name
      , allocation_unit_type_desc
      , COUNT(*) AS [Pages]
      , COUNT(*) * 8 AS [SizeKB]
FROM    sys.dm_db_database_page_allocations(DB_ID(), NULL, NULL, NULL,
                                            'DETAILED') AS al
        JOIN sys.database_files df ON al.extent_file_id = df.file_id
        JOIN sys.filegroups fg ON df.data_space_id = fg.data_space_id
        JOIN sys.indexes i ON al.object_id = i.object_id
                              AND al.index_id = i.index_id
WHERE   OBJECT_NAME(al.object_id) NOT LIKE 's%'
GROUP BY DB_NAME(database_id)
      , OBJECT_NAME(al.object_id)
      , fg.name
      , df.name
      , allocation_unit_type_desc
 
In the result set you can see that the allocations for these tables are all out of the Primary filegoup and some are in row and some are LOB data based on our table definitions 
 
FilegroupsAllocations
 
In next week’s post we will introduce the concepts and techniques needed to move this data into a new filegroup. If you absolutely cannot wait and would like the script shoot me an email and I’ll send it over to you.
 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.