I recently profiled a sproc that makes heavy use of the TSQL SUBSTRING function (hundreds of thousands of times) to see how it performs on a SQL 2005 database compared to a SQL 2000 database. Much to my surprise the SQL 2005 database performed worse...dramatically worse than SQL 2000.
After much researching it turns out the problem is that the column the text was stored in was an NTEXT, but SQL 2005 has deprecated the NTEXT in favor of NVARCHAR(MAX). Now, you'd think that string functions on NTEXT would have the same performance on 2005 as it did on 2000, but thats not the case.
Ok, so NTEXT is old badness, and NVARCHAR(MAX) is new goodness. Then the next logical step would be to convert the column to be a NVARCHAR(MAX) data type, but here lies a little but very important gotcha.
By default NTEXT stores the text value in the LOB structure and the table structure just holds a pointer to the location in the LOB where the text lives.
Conversely, the default setting for NVARCHAR(MAX) is to store its text value in the table structure, unless the text is over 8,000 bytes at which point it behaves like an NTEXT and stores the text value in the LOB , and stores a pointer to the text in the table.
So, just to recap, the default settings for NTEXT and NVARCHAR(MAX) are completely opposite.
Now, what do you think will happen when you execute an ALTER COLUMN on a NTEXT column that changes the data type to a NVARCHAR(MAX)? Where do you think the data will be stored? In the LOB structure or the table structure?
Well, lets walk through an example. First create a table with one NTEXT column:
CREATE TABLE [dbo].[testTable](
[testText] [ntext] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Next, put 20 rows in the table:
INSERT INTO testTable SELECT 'hmmm...i wonder if this will work'
Then run a select query with IO STATISTICS:
SET STATISTICS IO ON
SELECT * FROM testTable
SET STATISTICS IO OFF
Now, looking at the IO stats, we see there was only 1 logical read, but 60 LOB logical reads. This is pretty much as expected as NTEXT stores its text value in the LOB not the table:
Table 'testTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 60, lob physical reads 0, lob read-ahead reads 0.
Now, lets alter the table to be an NVARCHAR(MAX):
ALTER TABLE testTable ALTER COLUMN testText NVARCHAR(MAX) null
Now when we run the select query again with UI STATISTICS we still get a lot of LOB reads (though less than we did with NTEXT). So its obvious that when SQL Server did the alter table, it didn't use the default NVARCHAR(MAX) setting of text in row, but kept the text in the LOB and still uses pointers lookups to get the text out of the LOB.
Table 'testTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 40, lob physical reads 0, lob read-ahead reads 0.
This is not as expected and can be devastating for performance if you don't catch it, since NVARCHAR(MAX) with text not in row
actually performs WORSE
than NTEXT when doing SUBSTRING calls.
So how do we fix this problem? Its actually fairly easy. After running your alter table, run an update statement setting the column value to itself, like so:
UPDATE testTable SET testText = testText
SQL server moves the text from the LOB structure to the table (if less than 8,000 bytes). So when we run the select again with IO STATISTICS we get 0 LOB reads.
Table 'testTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
YEA! This is what we want.
Now, just for grins, what do you think happens if we change the NVARCHAR(MAX) back to NTEXT? Well it turns out that SQL Server moves the text back to the LOB structure. Completely backwards from what it did when converting NTEXT to NVARCHAR(MAX).