Table Partitioning in SQL Server 2005 Part 3

Finishing up the posts on table partitioning here, we will be looking at maintenance activities today.  Two of the main areas of maintenance for your database have to do with indexes and backups.  Partitioning helps shorten potential maintenance windows for both of these activities.  We will start with indexes.

Rebuilding or reorganizing indexes on large tables can take quite a long time to process, negatively affecting the availability of your database.  In SQL Server 2005 we have the ability to rebuild a stated partition number of an index.  For an example we will go back to our Invoice table illustrated earlier.  We will add a clustered index on the datetime column, which also happens to be our partitioning key.  As the table is already partitioned, the syntax is no different than creating an index on a non-partitioned table.

--Create a clustered index on the InvoiceDate column CREATE CLUSTERED INDEX IXC_Invoice_InvoiceDate ON dbo.Invoice (InvoiceDate);

If we had not already partitioned the table we could have created the clustered index adding the partitioning scheme and effectively implemented it that way.

We can verify the layout of the index by querying the sys.dm_db_partition_stats management view.

--Verify index SELECT i.[name] AS indexName ,i.type_desc AS indexType ,s.partition_number AS partitionNumber ,s.row_count AS numRows FROM sys.dm_db_partition_stats AS s INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id] WHERE s.[object_id] = OBJECT_ID('dbo.Invoice', 'U') AND s.index_id = i.index_id;

The results of the query show the index and the row counts for the Invoice table.

To check fragmentation on our partitions we can run the following query against the management function sys.dm_db_index_physical_stats.

--Check index fragmentation SELECT OBJECT_NAME(i.[object_id]) AS objectName ,s.index_type_desc AS indexType ,s.partition_number AS partitionNumber ,ROUND(s.avg_fragmentation_in_percent, 2) AS indexFragmentationPercent FROM sys.dm_db_index_physical_stats ( DB_ID() ,OBJECT_ID('dbo.Invoice', 'U') ,NULL ,NULL ,NULL ) AS s INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id] WHERE s.index_id = i.index_id;

Once we determine that maintenance is needed for our index we have two options.  We may choose either the REBUILD or REORGANIZE statements of ALTER INDEX.  These two options replace DBCC DBREINDEX and DBCC INDEXDEFRAG.  For full definition of both features see the ALTER INDEX entry in BOL.  Both options allow us to alter a stated partition number of an index.  The syntax for this is:

--REBUILD index on partition number 1 ALTER INDEX IXC_Invoice_InvoiceDate ON dbo.Invoice REBUILD PARTITION = 1; --REORGANIZE index on partition number 1 ALTER INDEX IXC_Invoice_InvoiceDate ON dbo.Invoice REORGANIZE PARTITION = 1;

This syntax allows us to stagger maintenance of large tables to better serve our application's needs.  We also have the benefit of only doing maintenance on the partitions that are experiencing fragmentation.  As data ages in your application you should see less and less fragmentation in your partitions.  Eventually you should be able to remove those partitions completely from maintenance activities.

This takes us nicely into our other maintenance topic, back up.  If you follow best practice recommendations mentioned earlier in this blog topic and create separate filegroups for each of your partitions you can implement filegroup back ups of your database.  The syntax for a filegroup back up is:

--Backup indvidual filegroup BACKUP DATABASE PartitionDB FILEGROUP = 'FG200801' TO DISK = 'C:\SQLData\FG200801.BAK';

To utilize filegroup back ups you will need to be in FULL recovery mode and take log back ups.  For full discussion of back up activities see the entry in BOL.

Again, we are able to take advantage of our data aging and move stable data to read-only filegroups.  Once the filegroup has been marked read-only we have effectively removed it from our maintenance cycle.

So now we have fully implemented a partition table in our database.  Hopefully this discussion will help you out in your own schemas.  If you are interested in a white paper version of this blog series go to the email section and drop me a line.

Enjoy!


Posted by: whitneyw
Posted on: 1/23/2008 at 11:45 PM
Tags:
Categories: Blog
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Table Partitioning in SQL Server 2005 Part 2

 

Last time we looked at the building blocks for a partitioned table in SQL Server 2005.  I decided to break out maintenance activities for the next post.  Today we will be discussing the various functions and management views available for you to interact with your partitioned table.

Keeping the Invoice table from our last post I added some test data with the following code (edited for brevity).

INSERT dbo.Invoice (InvoiceID, CustomerID, InvoiceDate, Amount) VALUES(1, 1, '2008-01-01 00:00:00', 100) GO 10 INSERT dbo.Invoice (InvoiceID, CustomerID, InvoiceDate, Amount) VALUES(2, 2, '2008-02-01 00:00:00', 200) GO 20 ... INSERT dbo.Invoice (InvoiceID, CustomerID, InvoiceDate, Amount) VALUES(11, 11, '2008-11-01 00:00:00', 1100) GO 110 INSERT dbo.Invoice (InvoiceID, CustomerID, InvoiceDate, Amount) VALUES(12, 12, '2008-12-01 00:00:00', 1200) GO 120

The code above uses the GO syntax I blogged about earlier to insert disparate numbers of rows into each partition.  This is done for easier allocation viewing later.

To view the allocation of rows across your partition you have multiple options.  You can make use of the $partition system function, the sys.dm_db_partition_stats management view, or the sys.partitions catalog view.  Each has their own flavor, with positives and negatives.

$partition system function

This system function returns the partition number into which a set of partitioning column values would be mapped for any specified partition function.  The function returns an integer value between 1 and number of partitions of the partition function.  The general syntax for use of this function is:

[ database_name. ] $PARTITION.partition_function_name(expression)
The following query returns the rows for our Invoice table.

SELECT $partition.DateRangePFN(InvoiceDate) AS PartitionNumber ,COUNT(InvoiceID) AS NumberOfRows FROM dbo.Invoice GROUP BY $partition.DateRangePFN(InvoiceDate) ORDER BY PartitionNumber;

Running this code you would notice that any empty partitions are not returned.  If you are interested in all partitions for your table you will want to make use of one of the following options.

sys.dm_db_partition_stats

This database related dynamic management view returns page and row counts for every partition.  The following query returns the row count for our Invoice table.

SELECT partition_number ,row_count FROM sys.dm_db_partition_stats WHERE [object_id] = OBJECT_ID('dbo.Invoice', 'U');

This output gives us a better view of the overall allocations for our table.  However, what if you also wanted index granularity in your counts?  For that we go to option three.

sys.partitions

This object catalog view contains a row for each partition of all the tables and indexes in a database.  Note that all tables and indexes in SQL Server 2005 are considered to contain at least one partition, even if they are not explicitly partitioned.  The following query returns the row count for our Invoice table.

SELECT OBJECT_NAME(p.[object_id]) AS ObjectName ,i.[name] AS IndexName ,p.partition_number ,[rows] FROM sys.partitions AS p INNER JOIN sys.indexes AS i ON p.[object_id] = i.[object_id] AND p.[index_id] = i.[index_id] WHERE p.[object_id] = OBJECT_ID('dbo.Invoice', 'U');

This output gives us the most granular view of allocations across our table.

SQL Server 2005 also introduces four other catalog views for managing your partition implementation:

I have included links to BOL for these functions as their simplicity doesn't really warrant further discussion.

Now you have full view into your partitioning implementation and can feel comfortable that your planned allocations are working.  Next time we will discuss maintenance activities to keep your partitions up and running at their best.

Enjoy!


Posted by: whitneyw
Posted on: 1/16/2008 at 11:46 PM
Tags:
Categories: Blog
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Table Partitioning in SQL Server 2005

I'm going to spend the next few posts discussing how you can implement partitioning in SQL Server 2005 to improve the performance and maintenance time of very large tables.

In previous versions of SQL Server it was possible to partition data by physically separating tables or by creating views.  Both of these approaches had flaws in performance or maintainability.  Luckily, SQL Server 2005 solved this problem by implementing true partitioning where we have the ability to manage where data resides yet maintain a single object.

At a high level there are only a few items needed to implement partitioning, these are:

  1. A partitioning key
  2. A filegroup
  3. A partition function
  4. A partition scheme

Partition Key

The selection of a partition key is one of the most important decisions you will make in your implementation.  This column will drive where data is divided along your partition boundaries.  Typically a partition key is either date based or an ordered grouping of data. 

It is crucial to determine the access pattern for the data you will be storing. You will want to ensure that your partitioning key is restricted in important queries’ WHERE clause.  This will allow partition elimination to take place (which will be discussed in a later post).

Filegroup

Filegroups can be employed to help separate your data which will improve performance and maintainability. It is generally best to have the same number of filegroups as partitions. Filegroups may have one or more files, but each partition must map to a filegroup. For more granular backup control you should design your partitioned tables so that related data resides in the same filegroup. The syntax creating the filegroup and file is:

ALTER DATABASE PartitionDB ADD FILEGROUP [FG200801]; GO ALTER DATABASE PartitionDB ADD FILE ( NAME = N'PartitionData200801' ,FILENAME = N'D:\SQLData\PartitionData200801.ndf' ,SIZE = 5MB ,FILEGROWTH = 5MB ) TO FILEGROUP [FG200801]; GO

Partition Function

The partitioning function defines the algorithm that is used to map rows of a table or index into partitions.  The scope of the partition function is limited to the database in which it is created.  The maximum number of partitions allowed in a partition function is 1000.  The general syntax to create a partition function is:

CREATE PARTITION FUNCTION partition_function_name ( input_parameter_type ) AS RANGE [ LEFT | RIGHT ] FOR VALUES ( [ boundary_value [ ,...n ] ] ) [ ; ]

All data types are valid for input_parameter_type, except text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined data types. 

The LEFT or RIGHT designation specifies which side of each boundary value the boundary_value belongs.  If not specified LEFT is the default.

If we wanted to create a range function to partition our data in a month/year form we could execute the statement below.

CREATE PARTITION FUNCTION DateRangePFN (DATETIME) AS RANGE RIGHT FOR VALUES('20080101', '20080201', '20080301', '20080401' ,'20080501', '20080601', '20080701', '20080801', '20080901', '20081001' ,'20081101', '20081201');

Upon executing this statement we would create 12 boundaries, with data partitioned in the following manner.

Partition 1 2 ... 11 12
Values < Jan 1 2008 >= Jan 1 2008
< Feb 1 2008
  >= Nov 1 2008
< Dec 1 2008
>= Dec 1 2008

Partition Scheme

The partition scheme associates the partitions with their physical location in a filegroup.  The partition scheme is created referencing the partition function to employ.  The general syntax to create a partition function is:

CREATE PARTITION SCHEME partition_scheme_name AS PARTITION partition_function_name [ ALL ] TO ( { file_group_name | [ PRIMARY ] } [ ,...n ] ) [ ; ]

If [ALL] is specified only one file_group_name can be specified.  Partitions are assigned to filegroups, starting with partition 1, in the order they are listed in [,...n].

Going back to our prior partition function example, we could create a partition scheme mapping the partitions to multiple filegroups by executing the statement below.

CREATE PARTITION SCHEME DateRangePS AS PARTITION DateRangePFN TO ('FG1', 'FG2', 'FG3', 'FG4', 'FG5', 'FG6', 'FG7', 'FG8', 'FG9', 'FG10' ,'FG11', 'FG12');

Upon executing the statement we would have associated filegroups and partitions in the following manner.

Filegroup FG1 FG2 ... FG11 FG12
Partition 1 2   11 12
Values < Jan 1 2008 >= Jan 1 2008
< Feb 1 2008
  >= Nov 1 2008
< Dec 1 2008
>= Dec 1 2008

At this point we have the objects necessary to create a table and partition data.  The create statement for our table will reference the partition scheme as well as the column used by the partition function.  The following code creates a table using our partition scheme and function.

CREATE TABLE dbo.Invoice ( InvoiceID INT NOT NULL ,CustomerID INT NOT NULL ,InvoiceDate DATETIME NOT NULL ,Amount MONEY NOT NULL ,StatusID TINYINT NOT NULL ,ShipDate DATETIME NOT NULL ) ON DateRangePS(InvoiceDate);

At this point any data inserted into the Invoice table will be partitioned by the specified boundaries in placed in their associated filegroups.  We have ourselves a partitioned table!

In my next post I will discuss indexing, back up, and other maintenance activities related to partitioning as well as some handy functions that have been added to T-SQL.


Posted by: whitneyw
Posted on: 1/9/2008 at 11:47 PM
Tags:
Categories: Blog
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Variable output with dynamic SQL

I've had a couple conversations about this subject this week so I thought I would post on this topic.  There are occasions where we need the output of a dynamic query placed into a variable for later use.  Today we will look at the stored procedure sp_executesql and show how you can utilize this system stored procedure with output variables.

If you are not familiar with sp_executesql you are probably aware of the EXEC() statement for processing dynamic queries.  sp_executesql is the preferred choice for this type of processing as it has the following benefits:

  • Parameter substitution is supported
  • Execution plans are created, making for more efficient processing
  • The execution string does not have to be rebuilt for each execution

More detail can be found on sp_executesql at BOL.  Here's a quick example of sp_executesql in action:

--Create test table IF OBJECT_ID('dbo.test', 'U') IS NOT NULL DROP TABLE dbo.test; GO CREATE TABLE dbo.test ( testID INT NOT NULL ,testValue VARCHAR(50) NOT NULL ); --Populate test table INSERT dbo.test (testID, testValue) VALUES (1, 'One'); INSERT dbo.test (testID, testValue) VALUES (2, 'Two'); INSERT dbo.test (testID, testValue) VALUES (3, 'Three'); INSERT dbo.test (testID, testValue) VALUES (4, 'Four'); INSERT dbo.test (testID, testValue) VALUES (5, 'Five'); --Set up parameters DECLARE @queryString NVARCHAR(MAX); DECLARE @intVariable INT; DECLARE @parameterDef NVARCHAR(500); DECLARE @outputTestID VARCHAR(50); SET @queryString = 'SELECT @testValueOUT = testValue FROM dbo.test WHERE testID = @testID'; SET @intVariable = 5; SET @parameterDef = '@testID INT, @testValueOUT VARCHAR(50) OUTPUT'; --Execute the query string EXECUTE sp_executesql @queryString ,@parameterDef ,@testID = @intVariable ,@testValueOUT = @outputTestID OUTPUT; --Return the output variable SELECT @outputTestID AS [variableOutput]; --Clean up after yourself DROP TABLE dbo.test;

The query string is a simple select from our table dbo.test.  You will notice we are already placing two variables in the query string, one to accept our input parameter and another to receive the output of our query.  The parameters are defined in the @parameterDef variable.  Here is where we see the parameter substitution referenced earlier.  If we were using the EXEC() statement we would have to either CAST or CONVERT our input, which is certainly possible, but opens up potential coding errors and type conversion issues.

To execute the query string we make our call to sp_executesql passing in the query string, the parameter definitions, and intialization of each variable used.

If we were to place the execution of the stored procedure and selection of @outputTestID into a loop for all values in the dbo.test table we could easily substitute the values into the query for processing without any additional work.  We would also see that the query plan is reused for the query.  We can check this with the following query of the dynamic management view sys.dm_exec_cached_plans:

--Check the plan cache for reuse SELECT usecounts ,cacheobjtype ,objtype ,[text] AS batchText FROM sys.dm_exec_cached_plans CROSS APPLY sys.dm_exec_sql_text(plan_handle) WHERE usecounts > 1 AND cacheobjtype = 'Compiled Plan' ORDER BY usecounts DESC ;

Another tool in your T-SQL tool belt.  Enjoy!

**Update:  I had to add a note saying that this is certainly a solution to a specific problem.  Please make sure you validate input for any dynamic SQL you use in your application.  SQL Injection is a nasty thing to have happen to a database application.  Okay, end of public service announcement.  :)


Posted by: whitneyw
Posted on: 1/6/2008 at 11:49 PM
Tags:
Categories: Blog
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (1) | Post RSSRSS comment feed