Geeks With Blogs
Chris Falter .NET Design and Best Practices

I spent much of last weekend running tests against a table with 122,000 records.  The folks at Citibank and Geico would just yawn at that amount of data, but where I work that's fairly heavy lifting.  As I was tuning and validating the tests, I used a bit-field flag ("TestCompleted") to track whether a record had already been tested.  No need to do work twice, eh?

Because this was a one-time set of tests, I used a TableAdapter (the simplest possible code) to grab all the records, then I needed to filter out all the records that had already been processed.  My initial solution was to use a DataView; just set the Filter property with a filter expression ("TestCompleted = false"), and voila--instant filtration!  Although it wasn't quite instant....my computer required 9 seconds to perform the operation.

For reasons I won't go into here, I had to switch to using an array of DataRecord returned by the DataTable.Select method.  The filter expression ("TestCompleted = false") was simply passed as a parameter to the Select method, rather than to the DataView.Filter setter.  So I ran the test again, and....whoa!

I had my array of DataRecord in just one second.  That's a whole order of magnitude faster than the DataView approach!

Frankly, I don't know why the DataView.Filter approach is so much more computationally expensive than the DataTable.Select approach.  And perhaps for small amounts of data, the two approaches would be equally good--or the DataView might even be better and faster.  I didn't test the small recordset scenario, so I don't know that either.  (Note: I have subsequently posted an analysis of the reason why the DataView is so much slower for those who are interested.)

What I do know is this: if you're filtering a large amount of data in a DataTable, and you care about system performance, you will want to filter your data by using DataTable.Select (to get an array of DataRecord), rather than using DataView.Filter.

Posted on Friday, August 4, 2006 6:34 PM Performance & Tuning | Back to top


Comments on this post: Array of DataRecord vs. DataView: A Dramatic Difference in Performance

# re: Array of DataRecord vs. DataView: A Dramatic Difference in Performance
Requesting Gravatar...
This is my guess

The DataTable.Select will return an array of DataRow(s)
the DataRow(s) objects are already created in the heap, and the Select method only creates an array that contains the addresses of these objects

Where DataView has to create a DataView object, and the dataView object contains DataRowView, which is a wrapper for the DataRow of the underlying table
But the DataView has to create a DataRowView anyway, so this will consume time and memory
Left by Ghassan on Aug 08, 2006 2:43 PM

# re: Array of DataRecord vs. DataView: A Dramatic Difference in Performance
Requesting Gravatar...
I would imagine that the filter is being done on the DB side...
Left by Dave on Sep 29, 2006 5:32 AM

# re: Array of DataRecord vs. DataView: A Dramatic Difference in Performance
Requesting Gravatar...
Actually, Dave, the DataView performs the filtering on the client side. That's the reason you can have several different (client-side) DataViews all pointing to the same (client-side) DataTable. And in fact, by examining the DataView IL, I have been able to determine why the performance was so bad; it has to do with the sort that the DataView must perform in order to set up a filter. See my subsequent post (http://geekswithblogs.net/chrisfalter/archive/2006/08/15/88057.aspx) for details.
Left by Chris Falter on Oct 02, 2006 1:30 PM

# re: Array of DataRecord vs. DataView: A Dramatic Difference in Performance
Requesting Gravatar...
Good post, I can’t say that I agree with everything that was said, but very good information overall:) http://www.highsneakers.com/ air max ltd is so cute!
Left by good product on Jun 11, 2010 3:03 AM

Your comment:
 (will show your gravatar)
 


Copyright © Chris Falter | Powered by: GeeksWithBlogs.net | Join free