OK—
So I am teaching SQL Administration this week, and I come across an example of database objects in SQL. Now, I have taught this class at least 10 times over the last three years and it has never really bothered me, but today it did. Probably because I have donXML’s article on “Viral Coding Examples…”
In this particular example, they were showing a clustered index on the Lastname of a customers table. Now, why is that a “bad” thing? Well, first an investigation of clustered/non-clustered indexes is in order.
To start this off, a non-indexed table is called a heap. Any time data is retrieved, a table scan is run to look for the correct data. In a table with an inordinate number of rows, this is a bad thing. So, we build an index.
So the DBA/developer pulls up SQL Enterprise Manager and decides to build an index on last name to make things run a bit faster. Up pops the index property page, and the user enters the name and clicks the column or columns for the index and then stall out on the “Clustered” checkbox. Most developers I know have no clue what the “Clustered” checkbox on the SQL index properties even does. (I have even stood behind a developer who tried creating three of four clustered indexes **SQL gurus chuckle here**) So what is it that the “Clustered” index actually does.
First, that unordered pile of data pages we called the heap—that gets shuffled into an ordered list of records. On top of that, then, we create a BTree structure to quickly find the data page. In this case, the leaf level page of the index is the actual data page. So that being the case, what happens if I try to created a second clustered index… Well, we rebuild the data ordered by the new index and throw out the old clustered index.
Now, there is a lot more we could say about indexes, but this entry is about “Viral” badness. So why is it bad to create the clustered index with the Lastname as the key value?
Well, now we have to move to the insert process. What happens when you add a new row to the database? Well, we have to put the data in the correct spot in the clustered index (in Lastname order, right?). So what happens if the data page that we are trying to insert into is full? We get a page split. The more we add records to the database, the more likely we will have page splits and the increased likelihood of database fragmentation. Over time, this becomes a problem. Was this the best choice for a clustering key? Do you even have to have a clustered index? The answers—No and No.
But wait—this is MOC (MS Official Curriculum). Granted, the slide is not talking about the indexes. The point of the slide isn’t indexes. The problem is that a casual user (those who don’t take the time to figure out what clustered means) sees this and assumes that a Clustered Index on Lastname is “OK.” My next few posts will be about indexing, but I need to substantiate the posts with pictures, so they will take a bit to draw.
Stay tuned.