SQL Server 2008 has introduced sparse columns to DDL activities. So when exactly would you use this new construct?
Say you are modeling a product line for a company. The company in question runs a sporting good store stocking shoes, hats, and equipment. As you might imagine a product table for these types of products would normally leave you with many NULL columns as certain properties of one product would not be applicable to another. Sparse columns look to save the day here as they are optimized for the storage of null values.
Like most things in the SQL Server world, sparse columns have their trade offs. While storage of null values is optimized you do pay a cost for non null values. The amount varies by data type, but it is typically in the 50% range. BOL recommends you look to sparse columns when you would receive a 20 to 40 percent space savings from traditional storage practices.
There are a few data types that aren't invited to the sparse column party. These are:
- geography
- geometry
- image
- ntext
- text
- timestamp
- user-defined data types
No real surprises here as several are set to be deprecated in future releases. The others mentioned might make a good follow on blog post.
So now to the interesting part, at least for me...what does the code look like? Here is an example product table with sparse columns for size, color, and shape:
IF OBJECT_ID('dbo.ProductSparse, 'U') IS NOT NULL
DROP TABLE dbo.ProductSparse;
CREATE TABLE dbo.ProductSparse
(
ProductID INT IDENTITY(1,1) CONSTRAINT PK_ProductSparse PRIMARY KEY CLUSTERED
,[Description] VARCHAR(100) NOT NULL
,Size SMALLINT SPARSE NULL
,Color VARCHAR(50) SPARSE NULL
,Shape VARCHAR(50) SPARSE NULL
);
Not much to it, right? Sparse columns look and behave like any other column in your table. The catalog view sys.columns does have an added is_sparse column if you need to validate whether a column is sparse. Object explorer also lists this information for each column in a table, before the data type.
To really contrast this lets add another table matching our ProductSparse table but without the SPARSE decoration on the columns:
CREATE TABLE dbo.ProductNotSparse
(
ProductID INT IDENTITY(1,1)
,[Description] VARCHAR(100) NOT NULL
,Size SMALLINT NULL
,Color VARCHAR(50) NULL
,Shape VARCHAR(50) NULL
);
Now let's add some data to the tables to see the storage implications. For this test I'm only inserting the [Description] column and letting the others remain NULL.
INSERT dbo.ProductSparse ([Description])
SELECT 'My Product';
GO 10000
INSERT dbo.ProductNotSparse ([Description])
SELECT 'My Other Product';
GO 10000
We can check the storage results with a query against a DMV (you should expect one on this blog, right?)...
SELECT OBJECT_NAME(object_id) AS objectName
,in_row_data_page_count
,in_row_reserved_page_count
,in_row_used_page_count
,used_page_count
,reserved_page_count
,row_count
,totalPages
,dataPages
,usedPages
FROM sys.dm_db_partition_stats AS s
INNER JOIN (
SELECT container_id, SUM(total_pages) AS totalPages, SUM(data_pages) AS dataPages, SUM(used_pages) AS usedPages
FROM sys.allocation_units
GROUP BY container_id
) AS au ON s.partition_id = au.container_id
WHERE s.object_id = OBJECT_ID('dbo.ProductSparse', 'U')
OR s.object_id = OBJECT_ID('dbo.ProductNotSparse', 'U');
If we execute this query we will see a very minimal difference in storage, only 2 more pages for the non NULL values. However, if we alter our GO loop to 100,000 we see a difference of roughly 28 pages. Now that's some savings!
As you look for ways to manage your growing data, keep sparse columns in mind. In the right situation they can be a really solid choice.
In my next post I will show how you can work with these sparse columns through other new technologies, namely filtered indexes and column sets.
-W
9f35cfe2-1c52-4bd6-a747-c56e05c5a259|0|.0