Making Stuff Faster
Design, Code, Database Performance

NTEXT vs NVARCHAR(MAX) in SQL 2005

Wednesday, April 16, 2008 2:34 PM
I recently profiled a sproc that makes heavy use of the TSQL SUBSTRING function (hundreds of thousands of times) to see how it performs on a SQL 2005 database compared to a SQL 2000 database.  Much to my surprise the SQL 2005 database performed worse...dramatically worse than SQL 2000.

After much researching it turns out the problem is that the column the text was stored in was an NTEXT, but SQL 2005 has deprecated the NTEXT in favor of NVARCHAR(MAX).  Now, you'd think that string functions on NTEXT would have the same performance on 2005 as it did on 2000, but thats not the case. 

Ok, so NTEXT is old badness, and NVARCHAR(MAX) is new goodness.  Then the next logical step would be to convert the column to be a NVARCHAR(MAX) data type, but here lies a little but very important gotcha.

By default NTEXT stores the text value in the LOB structure and the table structure just holds a pointer to the location in the LOB where the text lives. 

Conversely, the default setting for NVARCHAR(MAX) is to store its text value in the table structure, unless the text is over 8,000 bytes at which point it behaves like an NTEXT and stores the text value in the LOB , and stores a pointer to the text in the table.

So, just to recap, the default settings for NTEXT and NVARCHAR(MAX) are completely opposite.

Now, what do you think will happen when you execute an ALTER COLUMN on a NTEXT column that changes the data type to a NVARCHAR(MAX)?  Where do you think the data will be stored?  In the LOB structure or the table structure?

Well, lets walk through an example.  First create a table with one NTEXT column:

CREATE TABLE [dbo].[testTable](
    [testText] [ntext] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Next, put 20 rows in the table:

INSERT INTO testTable SELECT 'hmmm...i wonder if this will work'

Then run a select query with IO STATISTICS:

SET STATISTICS IO ON
SELECT * FROM testTable
SET STATISTICS IO OFF

Now, looking at the IO stats, we see there was only 1 logical read, but 60 LOB logical reads.  This is pretty much as expected as NTEXT stores its text value in the LOB not the table:

Table 'testTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 60, lob physical reads 0, lob read-ahead reads 0.

Now, lets alter the table to be an NVARCHAR(MAX):

ALTER TABLE testTable ALTER COLUMN testText NVARCHAR(MAX) null

Now when we run the select query again with UI STATISTICS we still get a lot of LOB reads (though less than we did with NTEXT).  So its obvious that when SQL Server did the alter table, it didn't use the default NVARCHAR(MAX) setting of text in row, but kept the text in the LOB and still uses pointers lookups to get the text out of the LOB.

Table 'testTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 40, lob physical reads 0, lob read-ahead reads 0.

This is not as expected and can be devastating for performance if you don't catch it, since NVARCHAR(MAX) with text not in row actually performs WORSE than NTEXT when doing SUBSTRING calls.

So how do we fix this problem?  Its actually fairly easy.  After running your alter table, run an update statement setting the column value to itself, like so:

UPDATE testTable SET testText = testText

SQL server moves the text from the LOB structure to the table (if less than 8,000 bytes).  So when we run the select again with IO STATISTICS we get 0 LOB reads. 

Table 'testTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

YEA!  This is what we want.

Now, just for grins, what do you think happens if we change the NVARCHAR(MAX) back to NTEXT?  Well it turns out that SQL Server moves the text back to the LOB structure.  Completely backwards from what it did when converting NTEXT to NVARCHAR(MAX).




Feedback

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Good to know, thanks. The last paragraph doesn't surprise me, but the rest I probably wouldn't have considered. 4/17/2008 9:11 AM | Alex

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

I just converted a table using this technique (90,000 records). I had three columns using the ntext type. I converted one column at a time and a simple select query which took 4 secs on the unchanged table still took 4s until I had changed all the columns using ntext to nvarchar, and then it came down to 3secs avg. In retrospect I think I would have like to have done some more intense queries just to see for myself what would happen.

Good to know, thanks! 5/5/2008 6:58 AM | Rhett

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

very helpful thanks. I discovered today that NTEXT is to be deprecated. I discovered this totally by chance through some very strange LINQ exception that I got because I mistyped a property. I guess I am 3+ years behind with SQL server:) 5/5/2008 6:41 PM | Stilgar

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Thanks for the information. I too am apparently a bit behind on SQL server, as I wasn't aware that the TEXT/NTEXT type were going to be deprecated. Now to learn more about LOBs, BLOBs, CLOBs, etc. 5/21/2008 1:16 PM | Brian

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Thanks for the info!
Any idea what the conversion does to the table size? Is the table larger because data is stored in-row or smaller because NVARCHAR is of a variable length? 6/16/2008 11:50 AM | Narine

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

If you use the default settings for NVARCHAR(MAX) then yes, your table will be larger, which can cause other performance problems because less rows can now fit in a page of memory.

Its really a trade off. You need to look at the entire ecosystem of your database (queries, indexes, and tables) and see what makes sense for your specific needs.

In my case, I ended up switching from NTEXT to NVARCHAR(MAX) to get the string processing speed, but then vertically split the table, moving the big string columns off to a new 1-to-1 lookup table, because adding the text to the table data structure hurt other queries. With the vertical partition and the NVARCHAR(MAX) changes, I got very good sproc throughput. 6/17/2008 7:52 AM | John

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Thanks for the post, but I'm confused about something. The NTEXT had 60 lob logical reads and when converted to NVARCHAR(MAX) is was 40 lob logical reads. Once the update was run, it was 0 lob logical reads indicating that less is better. Isn't 40 better than 60? Why did you indicate that 40 was worse than 60? 8/1/2008 8:30 AM | don

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Thank you so much for the post. I am just about to upgrade my SQL 2000 database to SQL 2005 and I have multiple tables which use ntext data type. You mentioned that "SQL server moves the text from the LOB structure to the table (if less than 8,000 bytes)." On my db, the ntext column hold more that 8000 characters. Do you know if SQL 2005 will move all data over? 8/29/2008 8:26 AM | Wei

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Thank you.

Just upgraded from SQL2000 to SQL2005, and trying to update some data into our new DB, but hit datatype issue.

Because the col is "ntext" it will NOT allow such a long substring of text to be stored/entered, so it looks like I need to convert the col to "nvarchar(max)" in order for the DB to accept this long string of char/HTML.

*Does anyone have a more extensive list of the datatypes that are being deprecated?*

ciao ciao Ntext! 10/1/2008 10:41 AM | Fennel Doyle

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Great post! I know this might be a old hat, but here's a conversion script:

select
table_name,
column_name,
(
'alter table [' + table_name + '] ' +
'alter column [' + column_name +'] ' +
case when data_type = 'text' then 'varchar(max)' else 'nvarchar(max)' end + ' ' +
case when is_nullable = 'yes' then 'null' else 'not null' end + ';' + char(10) +
'update [' + table_name + '] set [' + column_name + '] = [' + column_name + '];'
+ char(10) +char(13) ) [sql]
into
#tmp
from
information_schema.columns
where
data_type = 'ntext' or data_type = 'text'

while exists(select null from #tmp)
begin
set nocount on
declare @sql nvarchar(max)
select top 1 @sql = [sql] from #tmp
set nocount off
exec sp_executesql @sql
set nocount on
delete from #tmp where [sql] = @sql
end
drop table #tmp 10/28/2008 5:06 PM | Robert Baskette

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Interesting results. Now a question about this data being stored in row and IO performance.
Say I have a simple table
Id BigInt, Title Varchar(500), Detail Varchar(Max).
Id is the Clustered (and only) Index. I want to display 100 records at a time selecting just Id, Title and then if a record is selected fetch the Detail on demand.

Now with the Detail as text, I am going to fit a fairly large number of records per page and the IO cost to get a list of records will be low. Now what this is telling me is that if the Detail is stored in row unless it exceeds 8K then the number of records stored per page is going to decrease massively and IO use is going to go through the roof.
11/10/2008 11:48 AM | Simon Porter

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Rebert, I think this is more accurate:

select
o.name as table_name,
c.name as column_name,
(
'alter table [' + o.name + '] ' +
'alter column [' + c.name +'] ' +
case when c.system_type_id = TYPE_ID('text') then 'varchar(max)' else 'nvarchar(max)' end + ' ' +
case when c.is_nullable = 1 then 'null' else 'not null' end + ';' + char(10) +
'update [' + o.name + '] set [' + c.name + '] = [' + c.name + '];'
+ char(10) +char(13) ) [sql]
into
#tmp
from sys.columns c
left join sys.objects o on (c.object_id = o.object_id)
where
o.type = 'U' and o.is_ms_shipped = 0 and c.system_type_id = TYPE_ID('ntext') or c.system_type_id = TYPE_ID('text')

while exists(select null from #tmp)
begin
set nocount on
declare @sql nvarchar(max)
select top 1 @sql = [sql] from #tmp
set nocount off
exec sp_executesql @sql
set nocount on
delete from #tmp where [sql] = @sql
end
drop table #tmp 2/25/2009 7:58 AM | Francois Nel

# ok..so what about the foot print on disk

I have a a table which is 14 GIG when using (ntext).

After I convert this table to varchar(max), will will the footprint on the hard disk look like.

Will it be 1/2 the size to be about 7 GB?
or
Will it be the same size of 14GB?
or
Will it actually become bigger 21GB?


I choose to not recreate the table, since I had other idenity columns which I would have to turn
set identity off //on.

So I used alter table / alter column signature varchar(max).

In my first tests...it appears the table is actually increasing in size to be more than 21 GB in size? Not what I was anticpating...

Anynow, if someone could help me answer why this may happen the(table size actually increasing by 1/3)?

I even ran a backup and shrank the database to reclaim additional space (if any existed)...But the only thing that shrank was the log file (ldf).
to note: I had to set the log file size to be 16 GIG when originally performing the task becuase it said the log file was not large enough, even though I had set the log file to autogrow by 5%. This could be related to some other task running too....but expanding the log file before running alter table/alter column statement seem to at least fix that part.

--Still working on this situation...My goal was to get with the "new" standard of varchar(max) before looking into sql 2008 and to hopefully also shrink the size of our overall database/backups.

Any help would be appreciated.
Thanks, Doug Lubey of Louisiana

3/7/2009 10:10 AM | Douglas Lubey of Louisiana

# update to foot print size of varchar(max) ....RESULTS

Just like this article suggests...
After I altered the data column type to varchar(max) the table will EVENTUALLY SHRINK TO BE 1/2 THE SIZE..if you follow your alter table statement with and UPDATE TABLE STATEMENT.

ORIGINALLY WITH JUST RUNNING THE ALTER TABLE:
The table size did double....why? Behind the scenes SQL server 2005 actually creates a duplicate feild of the same data. One field with the data type "text" and one field with the datatype varchar(max).

BASED ON THE SUGGESTION ABOVE:
I then ran an update statement

UPDATE StSignaturesTbl
set signatureString=signatureString,
where Date_Created >= '1/1/2002'
================
At first the dataTable did not shrink even after running this statement...but slowly it did..after checking back on a hourly basis.

HERE IS THE OUTLINE OF WHAT HAPPENED:
-----
1>original table size: 14 GB
2>Alter Table/Alter Column: 21 GB (it increased)
3>Waited....it stayed the same ...21 gigabytes
4>Ran the update statement(Above)..no changes at first
5>Waited ....3-6 hours...the table eventually shrank to be 1/2 the size of the original foot print size on the hard disk..

END RESULT: After both tasks completed and after waiting patiently between tasks. THE FINAL SIZE FO THE DATATABLE IS NOW 7 GB in size....

THIS IS WHAT WE WERE EXPECTED...50% gain....in hard disk space..

Thanks again for this article...
Doug Lubey of Louisiana


ONE THING TO NOTE: make sure you ADD enough space to your log file (.ldf) to cover an additional 100% of the TABLE SIZE...before running the alter column/alter table statements or you will get a run-time error once the log file size reaches it upper limit (EVEN IF YOU have auto-increase set to 10%)...I Did have it said to increase automatically by 10%...SQL SERVER 2005 still said it could not reclaim disk space fast enough ..do not know why...but increasing the log file size to the original size + THE table size did the trick. USED THE properties(right-click_) of the database to do this. 3/8/2009 4:17 PM | Douglas Lubey of Louisiana

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

So much helpful.
Thank you Mate 4/21/2009 5:40 PM | Ethan

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

If migration performance is a consideration, you could use SELECT INTO with a conversion function into a new table. Then drop the old table and rename the newly created table. This would have the sideaffects of creating the storage properly, and operating faster in non-logged mode, not increasing the transaction log size. Of course, the columns null/identity options would need to be alters and constraints, indexes would need to be readded, so it is a bit more work. Recommend for a few very large tables. 4/22/2009 11:26 AM | Chip in Tampa

# update to foot print size of varchar(max) ....RESULTS

Douglas,
Keep in mind that going from NTEXT to VARCHAR(MAX) you are going from a datatype that supports unicode characters to one that does not. Hence, while you have saved half of the storage space for that data, any data that does not fit in your single byte VARCHAR(MAX) character set code page was lost and the potential for storing characters outside this single byte character set is also lost. So, TEXT and VARCHAR are good datatypes for singe-byte character sets and NTEXT and NVARCHAR are good for supporting all characters for all languages (i.e. Unicode character set). The reason the size is double with the N-Types is that they use a two-byte (USC2) encoding for the Unicode characters they represent. So, be sure to know what data you will be storing in these fields and then you will know which datatype is best for you (N vs Non-N types). 5/9/2009 1:04 AM | DBall

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

Thanks this was exactly what I was looking for. 7/23/2009 12:56 PM | Aaron

# Dow jones future live

Nice website. I can see why its so popular.
I am from Cyprus and also now'm speaking English, please tell me right I wrote the following sentence: "Please return your financial advisor for further dealer."

Thanks for the help :D, Liang. 9/23/2009 4:09 PM | Liang

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

These are good differences that i wanted to know..Thanx 9/30/2009 2:29 AM | Danish

# re: NTEXT vs NVARCHAR(MAX) in SQL 2005

thank you, this was exactly what I needed! 10/5/2009 5:13 PM | Daniel Williams

Post a comment