Geeks With Blogs
Chris Falter .NET Design and Best Practices

Well, maybe I solved it.  At least I have a reasonable hypothesis.

Reader “Ghassan” postulated in a comment to my earlier post that it's the extra burden of creating an array of DataRowView that puts the DataView approach at a disadvantage to DataTable.Select.  However, a glance at DataView's default indexer in Lutz Roeder's wonderful Reflector utility shows that the array of DataRowView is not created while the constructor is executing.  Instead, it is created when and if the DataView indexer is called. The DataView indexer calls its private GetElement method in order to return a DataRowView:

private DataRowView GetElement(int index)
{
      if ((index < 0) || (index >= this.recordCount))
      {
            throw ExceptionBuilder.GetElementIndex(index);
      }
      return this.RowViewCache[index];
}

And the RowViewCache uses lazy instantiation for the array of DataRowView:

internal DataRowView[] RowViewCache
{
      get
      {
            if (this.rowViewCache == null)
            {
                  this.rowViewCache = new DataRowView[this.Count];
                  for (int num1 = 0; num1 < this.Count; num1++)
                  {
                        this.rowViewCache[num1] = new DataRowView(this, num1);
                  }
            }
            return this.rowViewCache;
      }
}

In other words, unless and until your code attempts to get a DataRowView via the indexer, the DataView will not create the array of DataRowView at all.  And my code did not call the indexer until after the 9 seconds had elapsed.  So we must eliminate this hypothesis.  Of course, Ghassan, I did not supply a code listing, you could not have known this.  Believe me, I appreciate reader comments, and I wish I got more of them!  Please keep posting comments.  Please.

So what are the DataView constructor and RowFilter setter doing while the clock ticks off 9 seconds?  Here's what the default constructor does:

  1. It initializes default settings.  This involves maybe a couple of dozen assignment statements, which can't take more than a few milliseconds.
  2. It instantiates an internal DataViewListener, which creates delegates for the DataTable events that need to be monitored.  Again, this should take only a few milliseconds at most.
  3. It instantiates an internal Index, which is an array of int members that index into the DataTable.  This involves:
    • Instantiating the array, which should (once again) involve modest computing resources.
    • Setting the bounds of the array by applying any filter criteria.  For the default constructor, there is no filter, so it's a one-step operation; the size is equal to the DataTable's record count.
    • Sorting the array. 

Then my next line of code, the call to the DataView.RowFilter setter, will create a second internal Index.  When index sets the bounds of its array, it will have to apply the RowFilter criteria.  This should be comparable to the computing resources used by the DataTable.Select method.

The only significant difference in computing resources, then, must come from the fact that instantiating a DataView and setting its RowFilter involves sorting a rather large array of record indices twice.  Let's take a closer look at that sort:

private void Sort(int left, int right)
{
      int num1;
      do
      {
            num1 = left;
            int num2 = right;
            int num3 = this.records[(num1 + num2) >> 1];
            do
            {
                  while (this.CompareRecords(this.records[num1], num3) < 0)
                  {
                        num1++;
                  }
                  while (this.CompareRecords(this.records[num2], num3) > 0)
                  {
                        num2--;
                  }
                  if (num1 <= num2)
                  {
                        int num4 = this.records[num1];
                        this.records[num1] = this.records[num2];
                        this.records[num2] = num4;
                        num1++;
                        num2--;
                  }
            }
            while (num1 <= num2);
            if (left < num2)
            {
                  this.Sort(left, num2);
            }
            left = num1;
      }
      while (num1 < right);
}

We see that the Sort method implements the Quicksort algorithm.  It chooses an arbitrary midpoint, places all the values lower than the midpoint's to the left and all the higher values to the right.  Then it applies itself recursively to the left and right sections.  Eventually it will recurse down to sections that cannot be divided into smaller sections (i.e., when a section consists of a single array member), at which point the sort has completed.  Using big-O notation, we can say that this algorithm executes in O(n log n) time, which is about as efficient as you can expect from a sorting algorithm.  As long as each iteration of the sort divides the set of indices into two equal parts, we are dealing with a logarithm of base 2.   So for 122,000 records (approximately 2 raised to the 17th power), the algorithm will call the Index.CompareRecords method approximately 122,000 x 17 times, which is a little over 2 million times.  And each call to Index.CompareRecords traverses a call stack several layers deep as it wends its way through the object model.  Do this whole scenario twice and you can chew up 9 seconds.

At least I think so.  To prove all this, you would have to instrument the System.Data code and check run-time performance with a testing tool.  So if there's a Computer Science student out there who wants an interesting research project, have some fun with Mono and tell us what you find.  Anyone?

Posted on Tuesday, August 15, 2006 6:24 PM Performance & Tuning | Back to top


Comments on this post: Solved: The Mystery of DataView's Poor Performance with Large Recordsets

# re: Solved: The Mystery of DataView's Poor Performance with Large Recordsets
Requesting Gravatar...
I really liked your take on the subject. The links look good and useful, too.
Left by Emily on Nov 30, 2007 8:07 PM

# re: Solved: The Mystery of DataView's Poor Performance with Large Recordsets
Requesting Gravatar...
Thought this was interesting...

http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=93608
Left by R L on Dec 11, 2007 10:47 AM

# re: Solved: The Mystery of DataView's Poor Performance with Large Recordsets
Requesting Gravatar...
R,

Thanks for pointing out the change request on MS' community site. I have added this comment to the request:

"I have analyzed this problem in-depth (see http://geekswithblogs.net/chrisfalter/archive/2006/08/15/88057.aspx). I believe that the DataView instantiation time could be reduced dramatically (if and only if the user does not specify a filter or sort) by declining to sort the DataRecords. In this situation, the DataView could simply bypass its own sort, and accept the table's ordering of records.

"Once a user specifies a filter or sort, though, the DataView will have to create its own index, and the Quicksort algorithm it uses is hard to improve."
Left by Chris Falter on Dec 12, 2007 4:53 AM

Your comment:
 (will show your gravatar)


Copyright © Chris Falter | Powered by: GeeksWithBlogs.net | Join free