Geeks With Blogs

News Hi, my name is Vincent Grondin and I'm a senior consultant at Fujitsu, a consulting firm in Montreal, Québec. I'm starting that blog to share some of my thoughts and knowledge on .NET architectures and code. Being a consultant in the .NET world I come across many different things. Some good, some bad and others that are worth a blog post here whether they be good, or hmmm... shall I say, less good :) I hope you enjoy yourself while learning new stuff. Feel free to leave a comment or contact me anytime.
Vincent Grondin

 

Here’s a quick tip on how to optimize your .NET 4.0 Self Tracking Entities.  On our last project at Fujitsu we ended up having to load hundreds of thousands of entities in collections of some of our STEs.  Don’t ask us why we did it because the answer is “this is how the client wanted it”.  If you did that in the past, you know that loading that many entities in STEs can literally take many many minutes…  I mean, how can populating a collection take that much time, or even seconds for that matter?  When you look at this process and start profiling it, you can see where the STEs fail to cope with increasing load.  That’s because they use the Contains method all the time…  The Contains method on a Collection<T> is rather slow compared to that of other containers like, say, a Hashset.  So we added a private “non serialized” Hashset<T> to the  TrackableCollection<T> class defined in the T4 file for the generated entities.  Then we added a function to that class to check whether an entity was already in the Hashset and this is the function we used everywhere in that T4 template instead of the TrackableCollection<T> Contains method.  Now every time an entity is either added or removed inside the TrackableCollection<T> we also do this same operation to the Hashset which is super fast and doesn’t affect performance much.  So we ended up saving a ton of time duplicating entities so yes it’s a high memory footprint but since it’s not serialized, it didn’t affect our performance either when transfering it.  The hashset is simply repopulated just like the normal TrackableCollection<T> is when deserializing and ignored upon serialization.  That was step one…  Here’s how our TrackableCollection<T> now looks like:

 

public class TrackableCollection<T> : ObservableCollection<T>
{
    private readonly HashSet<T> hashSet = new HashSet<T>();

    internal bool HashSetContains(T element)
    {
        return hashSet.Contains(element);
    }

    protected override void ClearItems()
    {
        new List<T>(this).ForEach(t => Remove(t));
    }
   
    protected override void InsertItem(int index, T item)
    {
        if (hashSet.Add(item))
        {
            base.InsertItem(index, item);
        }
    }

    protected override void RemoveItem(int index)
    {
        var element = this[index];
        base.RemoveItem(index);
        hashSet.Remove(element);
    }

    protected override void SetItem(int index, T item)
    {
        var ancienElement = this[index];
        base.SetItem(index, item);
        hashSet.Remove(ancienElement);
        hashSet.Add(item);
    }

}

 

 

Step two was to stop using the Enumerable.Contains extension method instead and use the Hashset.Contains method like we should be.  Wait… what?  Now I’m sure you’re saying,  “ I’d never use the Enumerable.Contains method over the Hashset.Contains method because I know the Enumerable.Contains method is O(n) and Hashset is O(1) ”…   Yes… we know that too…  but if you open the T4 that generates your Entity Framework Context, you will find a method in there just like this one : 

public bool Contains(object entity)
{
      return _allEntities.Contains(entity);
}

Now notice that entity is declared as an object in that Contains method.

Also , just a few lines above that Contains method you will see how the _allEntities is declared :

private readonly HashSet<IObjectWithChangeTracker> _allEntities;

 

So basically you’d think that since the code is calling Hashset.Contains, the CLR would actually call the Hashset’s contains method and not the extension method with the same name but…. you’d be wrong Smile   The overload resolution looks at the type of the parameter and when we profiled this code we saw it go to the extension method all the time because “entity” is defined as object and not as “IObjectWithStateTracker”…  So what did we do?  We simply casted the entity to an “IObjectWithStateTracker” and bam, on our next run we hit the Hashset’s contain method and man was it fast this time Smile 

        public bool Contains(object entity)

        {

            return _allEntities.Contains(entity as IObjectWithChangeTracker);

      }

 

Try this for yourself, it’s simple and T4s are great for this kind of poking around Smile

I hope this leads to lots of improvements in your large collections inside STEs…

Happy coding all !

Posted on Wednesday, April 30, 2014 9:41 PM | Back to top


Comments on this post: Optimizing your Self Tracking Entities

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Vincent Grondin | Powered by: GeeksWithBlogs.net