Why is moving data between filegroups hard?
As a consultant its common to walk into a customer site and find databases that are contained in one very large file. For various reasons it can be beneficial to adjust the number a data files for a database. See here. However, in SQL Server moving data from a one file database into a multi-file configuration is a non-trivial task. It’s a two step process, requiring that you add a new filegroup then in the filegroup add your multi-file configuration. Once you have that up, then we need to rebuild the indexes into that filegroup. This can be challenging if you have a lot of tables with a lot of indexes as SSMS allows you do move data but only for non-clustered indexes and only one at a time. Another issue is there are different techniques for moving different physical structures such as clustered indexes, heap and tables with LOB data.
In this post we’re going to introduce the some of the internal physical storage structures of a SQL Server database and describe how you can see what physical structures have been allocated where and to which objects. In part 2 of our series, we’ll introduce the concepts and techniques on how to move data between filegroups with Powershell.
SQL Server structures – where’s the data stored?
Let’s identify the SQL Server storage concepts that we will need to be familiar with for our filegroup migration process. This is simply a brief description, each of these concepts are very deep topics in their own right.
- Pages – an 8KB data structure that is the actual storage unit for data.
- Extents – 8 physically contiguous pages, the unit in which space is managed.
- Database files – the physical storage location of database data on the file system and disk.
- Filegroup – a collection of databases files.
- Database – the logical collection of tables that store data. Housed in a filegroup or collection of filegroups.
- Clustered indexes – b-trees with data stored at the leaf level, the actual table data in key order.
- Non-clustered indexes – b-trees with data stored at the level, a copy of table data in it’s own key order. Includes a pointer back to the key of clustered indexes or the RID of a heap.
- Heaps – table data without a clustered index. A loose collection of pages.
- LOB – a special page type used to store large objects and binary data.
Test Database Setup
For our exploration of physical storage we’re going to need a database to examine, let’s create a database [TestDB]
CREATE DATABASE [TestDB] ON PRIMARY ( NAME = N'TestDB', FILENAME = N'C:\DATA\TestDB.mdf' , SIZE = 10240KB , MAXSIZE = UNLIMITED, FILEGROWTH = 10240KB ) LOG ON ( NAME = N'TestDB_log', FILENAME = N'C:\LOG\TestDB_log.ldf' , SIZE = 10240KB , MAXSIZE = UNLIMITED , FILEGROWTH = 10240KB )
Test Table Setup
- Tag bytes – 4 bytes
- integer – 4 bytes
- char – 8000 bytes
- char – 49 bytes
- NULL bitmap 3 bits
- Table with clustered Index
CREATE TABLE [dbo].[t1]( [c1] [int] IDENTITY(1,1) NOT NULL, [c2] CHAR(8000) NOT NULL, [c3] CHAR(49) NOT NULL, CONSTRAINT [t1_cl_c1] PRIMARY KEY CLUSTERED ( [c1] ASC ) )
- Table without a cluster index (heap)
CREATE TABLE [dbo].[t2]( [c1] [int] IDENTITY(1,1) NOT NULL, [c2] CHAR(8000) NOT NULL, [c3] CHAR(49) NOT NULL )
- Table with clustered index and LOB data
CREATE TABLE [dbo].[t3]( [c1] [int] IDENTITY(1,1) NOT NULL, [c2_lob] [VARCHAR](max) NOT NULL, --max width, stored in data page [c3_lob] [VARCHAR](max) NOT NULL, --max width, stored in text page CONSTRAINT [t3_cl_c1] PRIMARY KEY CLUSTERED ( [c1] ASC ) )
INSERT INTO [t1] VALUES (REPLICATE('A', 8000), REPLICATE('B', 49)) GO 16 INSERT INTO [t2] VALUES (REPLICATE('C', 8000), REPLICATE('D', 49)) GO 16 INSERT INTO [t3] VALUES (REPLICATE('C', 8000), REPLICATE('D', 16000)) GO 16
SELECT DB_NAME(database_id) AS [DatabaseName] , OBJECT_NAME(al.object_id) AS [Table] , fg.name AS [FG-Name] , df.name , allocation_unit_type_desc , COUNT(*) AS [Pages] , COUNT(*) * 8 AS [SizeKB] FROM sys.dm_db_database_page_allocations(DB_ID(), NULL, NULL, NULL, 'DETAILED') AS al JOIN sys.database_files df ON al.extent_file_id = df.file_id JOIN sys.filegroups fg ON df.data_space_id = fg.data_space_id JOIN sys.indexes i ON al.object_id = i.object_id AND al.index_id = i.index_id WHERE OBJECT_NAME(al.object_id) NOT LIKE 's%' GROUP BY DB_NAME(database_id) , OBJECT_NAME(al.object_id) , fg.name , df.name , allocation_unit_type_desc