Geeks With Blogs
Evan Koch Musings on BizTalk Server and SQL Server
n the previous post we saw how indexes can speed up data access; now we’ll discuss the index itself.  Some of the information used in this post comes from SQL 2005 Books Online that’s installed with the SQL Server 2005 installation, so check it out if you want more information. 

There are two main types of indexes within SQL: clustered and nonclustered indexes.   You can only define one clustered index per table which must reside in the same file group as the table it references.  You’re allowed to create as many nonclustered indexes as you see fit that can reside in the file group of your choice.

The clustered index is important because it defines the order in which the records within the table stored.  If there’s no clustered index on table, the records are stored in a heap and there’s no particular rhyme or reason to the order in which the records are stored.  To demonstrate this, I’ve created two very similar tables; the difference between these tables is the column on which the clustered index is based.

In each table I’ve created five entries where the description is the key of the row spelled out.  We see a bit of difference when selecting all of the rows out of each table.

Since there was no ORDER BY clause defined, the records were returned in the order in which they were stored in the disk.  In TestTable1 the clustered index was based on TestTable1Key so the records were returned in ascending order based on those values.  Likewise, the clustered index on TestTable2 was based on TestTable2Description so the records were returned in ascending alphabetical order based on TestTable2Description.  In the third query from a previous post regarding analysis of execution plans, this is why we were able to leave out the ORDER BY clause – the records were already stored in that order.

A nonclustered index contains references to rows that meet the specific criteria.  A nonclustered index is similar to an index within a book where one might look up a topic and find the page numbers that mention the topic.  There are certain instances, generally when the number of rows within the table is quite small, where this extra step of checking the index and then following the reference to the data can take longer than if you were to scan through all of the rows within the table.

With that said, let’s look at the how the indexes are organized.  Both clustered and nonclustered indexes are stored via B-trees.  For SQL Server, all data is stored on disk via pages and extents.   With that in mind, let’s look at the structure of a clustered index.  From BOL:

For a clustered index, the root_page column in sys.system_internals_allocation_units points to the top of the clustered index for a specific partition. SQL Server moves down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server moves through the index to find the starting key value in the range and then scans through the data pages using the previous or next pointers. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.

This illustration shows the structure of a clustered index in a single partition.

For the clustered indexes it makes sense for the data to be stored at the leaf nodes of the B-tree because the clustered index defines how the data is stored on the disk.

Nonclustered indexes have a similar structure, though their leaf nodes are index rows which point to the data.  As mentioned before, this additional level of indirection can consume additional resources.

There can be a lot of resources involved with keeping an index up to date.  Let’s go back to TestTable2 and add another row with a TestTable2Description value of “Six”.  Before adding this row the table looks like this:

With the new row:

Because the table’s data is stored on the disk in a particular order based on the value of TestTable2Description, the database engine has to make space between One and Three and then insert the new record.  If the table contained additional columns per row and had a number of rows, it’s easy to imagine this type of operation as being non-trivial.  In a later post I’ll discuss what can be done to help maintain indexes.

 

Posted on Monday, August 20, 2007 9:49 AM | Back to top


Comments on this post: Indexes

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Evan Koch | Powered by: GeeksWithBlogs.net