Filtered Indexes in SQL Server 2008

After a brief blog delay we're back and continuing our example from last time around sparse columns and their uses in SQL Server 2008.

Today I want to talk about filtered indexes and how you can use them with your sparse columns.  Continuing with the dbo.ProductSparse table, I've updated a percentage of the table with values and left a small amount NULL.  Here's the quick code:

--Update some rows but leave NULLs UPDATE dbo.ProductSparse SET Size = 1 WHERE ProductID BETWEEN 0 AND 2000; UPDATE dbo.ProductSparse SET Size = 2 WHERE ProductID BETWEEN 2001 AND 4000; UPDATE dbo.ProductSparse SET Size = 3 WHERE ProductID BETWEEN 4001 AND 6000;

If we execute the query below on dbo.ProductSparse we will see an index scan and some IO:

SELECT ProductID, [Description], Size FROM dbo.ProductSparse WHERE Size = 1;

Table 'ProductSparse'. Scan count 1, logical reads 45, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So let's add an index filtering out NULL values for the Size column.  The only syntax difference for filtered indexes is the inclusion of a WHERE clause at the end of the declaration.

CREATE INDEX IDX_ProductSparse_Size ON dbo.ProductSparse (Size) WHERE Size > 0 AND Size < 3;

Now if we execute the same query as before we will see an index seek.

We've now modeled desperate data and given ourselves the ability to query it in a palatable manner.

To see the layout of the data with the addition of our filtered index we can query the sys.dm_db_partition_stats DMV:

SELECT i.name ,s.in_row_data_page_count ,s.in_row_reserved_page_count ,s.in_row_used_page_count ,s.row_count FROM sys.dm_db_partition_stats AS s INNER JOIN sys.indexes AS i ON s.index_id = i.index_id AND s.object_id = i.object_id WHERE s.object_id = OBJECT_ID('dbo.ProductSparse', 'U');

Here's our output:

So we can see some nice benefits of filtered indexes coupled with sparse columns.  In prior versions of SQL Server I would have had to implement an indexed view to get this same functionality.  Now I can maintain a single structure and have none of the additional overhead associated with an indexed view.  Good times for everyone. :)

-W


Posted by: whitneyw
Posted on: 3/25/2008 at 11:34 PM
Tags:
Categories: Blog
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

IE8 beta is available

Not really data related but if you want to kick the tires on Microsoft's shiny new browser, go here.

On a separate note, I've been in snowy Minnesota all week which has really thrown off my blogging.  All apologies to you the fine reader of this space.  I promise to be better.  :)

-W

 

**Update: slight adjustment to title that this is beta 1 version.


Posted by: whitneyw
Posted on: 3/12/2008 at 11:38 PM
Tags:
Categories: Blog
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

AtlantaMDF : May 12th

If you are going to be in Atlanta on May 12th swing on by the Atlanta SQL Server users group.  I will be giving a talk on Dynamic Management Views and how they can save your life.  No, really...they will.

I will post a reminder closer to the date but if you are like me it helps to go ahead and put it on the Outlook calendar.

-W

Technorati Tags:

Posted by: whitneyw
Posted on: 3/11/2008 at 11:37 PM
Tags:
Categories: Blog
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Sparse Columns in SQL Server 2008

SQL Server 2008 has introduced sparse columns to DDL activities.  So when exactly would you use this new construct?

Say you are modeling a product line for a company.  The company in question runs a sporting good store stocking shoes, hats, and equipment.  As you might imagine a product table for these types of products would normally leave you with many NULL columns as certain properties of one product would not be applicable to another.  Sparse columns look to save the day here as they are optimized for the storage of null values.

Like most things in the SQL Server world, sparse columns have their trade offs.  While storage of null values is optimized you do pay a cost for non null values.  The amount varies by data type, but it is typically in the 50% range.  BOL recommends you look to sparse columns when you would receive a 20 to 40 percent space savings from traditional storage practices.

There are a few data types that aren't invited to the sparse column party.  These are:

  • geography
  • geometry
  • image
  • ntext
  • text
  • timestamp
  • user-defined data types

No real surprises here as several are set to be deprecated in future releases.  The others mentioned might make a good follow on blog post.

So now to the interesting part, at least for me...what does the code look like?  Here is an example product table with sparse columns for size, color, and shape:

IF OBJECT_ID('dbo.ProductSparse, 'U') IS NOT NULL
DROP TABLE dbo.ProductSparse;

CREATE TABLE dbo.ProductSparse
(
ProductID        INT             IDENTITY(1,1)	CONSTRAINT PK_ProductSparse PRIMARY KEY CLUSTERED
,[Description]   VARCHAR(100)    NOT NULL
,Size            SMALLINT        SPARSE NULL
,Color           VARCHAR(50)     SPARSE NULL
,Shape           VARCHAR(50)     SPARSE NULL
);

Not much to it, right?  Sparse columns look and behave like any other column in your table.  The catalog view sys.columns does have an added is_sparse column if you need to validate whether a column is sparse.  Object explorer also lists this information for each column in a table, before the data type.

To really contrast this lets add another table matching our ProductSparse table but without the SPARSE decoration on the columns:

CREATE TABLE dbo.ProductNotSparse
(
ProductID        INT             IDENTITY(1,1)
,[Description]   VARCHAR(100)    NOT NULL
,Size            SMALLINT        NULL
,Color           VARCHAR(50)     NULL
,Shape           VARCHAR(50)     NULL
);

Now let's add some data to the tables to see the storage implications.  For this test I'm only inserting the [Description] column and letting the others remain NULL.

INSERT dbo.ProductSparse ([Description])
SELECT 'My Product';
GO 10000

INSERT dbo.ProductNotSparse ([Description])
SELECT 'My Other Product';
GO 10000

We can check the storage results with a query against a DMV (you should expect one on this blog, right?)...

SELECT OBJECT_NAME(object_id) AS objectName
    ,in_row_data_page_count
    ,in_row_reserved_page_count
    ,in_row_used_page_count
    ,used_page_count
    ,reserved_page_count
    ,row_count
    ,totalPages
    ,dataPages
    ,usedPages
FROM sys.dm_db_partition_stats AS s
    INNER JOIN (
                SELECT container_id, SUM(total_pages) AS totalPages, SUM(data_pages) AS dataPages, SUM(used_pages) AS usedPages
                FROM sys.allocation_units
                GROUP BY container_id    
                ) AS au ON s.partition_id = au.container_id
WHERE s.object_id = OBJECT_ID('dbo.ProductSparse', 'U')
    OR s.object_id = OBJECT_ID('dbo.ProductNotSparse', 'U');

If we execute this query we will see a very minimal difference in storage, only 2 more pages for the non NULL values.  However, if we alter our GO loop to 100,000 we see a difference of roughly 28 pages. Now that's some savings! 

As you look for ways to manage your growing data, keep sparse columns in mind.  In the right situation they can be a really solid choice.

In my next post I will show how you can work with these sparse columns through other new technologies, namely filtered indexes and column sets.

-W


Posted by: whitneyw
Posted on: 3/11/2008 at 11:36 PM
Tags:
Categories: Blog
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed