Object equality in Except and other set operators in LINQ

Here is an interesting issue I noticed when using the Except extension method.
I have list of users from which I want to exclude some users:

The list of users is coming from an XML file:

 
  1. <Users> 
  2.   <User id="1" name="Jack"/> 
  3.   <User id="2" name="Jim"/> 
  4.   <User id="3" name="Joe"/> 
  5.   <User id="4" name="James"/> 
  6.   <User id="5" name="Tom"/> 
  7.   <User id="6" name="Matt"/> 
  8.   <User id="7" name="Jon"/> 
  9.   <User id="8" name="Jill"/>  
  10.   
  11. </Users> 

The code goes like this:

 
  1. interface IUser 
  2.     { 
  3.          int ID { get; set; } 
  4.          string Name { get; set; } 
  5.     } 
  6.  
  7.     class User: IUser 
  8.     { 
  9.  
  10.         #region IUser Members 
  11.  
  12.         public int ID 
  13.         { 
  14.             get
  15.             set
  16.         } 
  17.  
  18.         public string Name 
  19.         { 
  20.             get
  21.             set
  22.         } 
  23.  
  24.         #endregion 
  25.  
  26.         public override string ToString() 
  27.         { 
  28.             return ID + ":" +Name; 
  29.         } 
  30.        
  31.  
  32.         public static IEnumerable<IUser> GetMatchingUsers(IEnumerable<IUser> users) 
  33.         { 
  34.              IEnumerable<IUser> localList = new List<User> 
  35.              { 
  36.                 new User{ ID=4, Name="James"}, 
  37.                 new User{ ID=5, Name="Tom"
  38.  
  39.              }.OfType<IUser>(); 
  40.              var matches = from u in users 
  41.                            join lu in localList 
  42.                                on u.ID equals lu.ID 
  43.                            select u; 
  44.              return matches; 
  45.         } 
  46.     } 
  47.  
  48.     class Program 
  49.     { 
  50.         static void Main(string[] args) 
  51.         { 
  52.             XDocument doc = XDocument.Load("Users.xml"); 
  53.             IEnumerable<IUser> users = doc.Element("Users").Elements("User").Select 
  54.                 (u => new User 
  55.                     { ID = (int)u.Attribute("id"), 
  56.                       Name = (string)u.Attribute("name"
  57.                     } 
  58.                 ).OfType<IUser>();       //still a query, objects have not been materialized 
  59.  
  60.              
  61.             var matches = User.GetMatchingUsers(users); 
  62.             var excludes = users.Except(matches);    // excludes should contain 6 users but here it contains 8 users 
  63.   
  64.         } 
  65.     } 

When I call User.GetMatchingUsers(users) I get 2 matches as expected. The issue is that when I call users.Except(matches), the matching users are not being excluded at all!
I am expecting 6 users but excludes variable contains all 8 users instead.

Since all I'm doing in GetMatchingUsers(IEnumerable<IUser> users) is taking the IEnumerable<IUser> and just returning the IUsers whose ID's match( 2 IUsers in this case), my understanding is that by default Except will use reference equality for comparing the objects to be excluded. Is this not how Except behaves?

What is even more interesting is that if I materialize the objects using .ToList() and then get the matching users, and call Except, everything works as expected!
Like so:

 
  1. IEnumerable<IUser> users = doc.Element("Users").Elements("User").Select 
  2.                 (u => new User 
  3.                     { ID = (int)u.Attribute("id"), 
  4.                       Name = (string)u.Attribute("name"
  5.                     } 
  6.                 ).OfType<IUser>().ToList();   //explicity materializing all objects by calling ToList() 
  7.  
  8. var matches = User.GetMatchingUsers(users); 
  9. var excludes = users.Except(matches);   // excludes now contains 6 users as expected 
  10.    

I don't see why I should need to materialize objects for calling Except given that it's defined on IEnumerable<T>?

Jeff Yates helped provide some insight into why this is so.

While we are at it, let's see first hand as to why Except, and other extension methods(especially those related to set operations) behave as they do by viewing Except in Reflector

 
  1. public static IEnumerable<TSource> Except<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second) 
  2.     if (first == null
  3.     { 
  4.         throw Error.ArgumentNull("first"); 
  5.     } 
  6.     if (second == null
  7.     { 
  8.         throw Error.ArgumentNull("second"); 
  9.     } 
  10.     return ExceptIterator<TSource>(first, second, null); 
  11.  
  12.   
  13. private static IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer) 
  14.     <ExceptIterator>d__92<TSource> d__ = new <ExceptIterator>d__92<TSource>(-2); 
  15.     d__.<>3__first = first; 
  16.     d__.<>3__second = second; 
  17.     d__.<>3__comparer = comparer; 
  18.     return d__; 
  19.  
  20.   

 
  1.   [CompilerGenerated] 
  2. private sealed class <ExceptIterator>d__92<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable 
  3.  
  4. //other methods/properties/field etc. elided 
  5. ..... 
  6.  
  7. private bool MoveNext() 
  8.     { 
  9.         try 
  10.         { 
  11.             switch (this.<>1__state) 
  12.             { 
  13.                 case 0: 
  14.                     this.<>1__state = -1; 
  15.                     this.<set>5__93 = new Set<TSource>(this.comparer); 
  16.                     foreach (TSource local in this.second) 
  17.                     { 
  18.                         this.<set>5__93.Add(local); 
  19.                     } 
  20.                     this.<>7__wrap95 = this.first.GetEnumerator(); 
  21.                     this.<>1__state = 2; 
  22.                     while (this.<>7__wrap95.MoveNext()) 
  23.                     { 
  24.                         this.<element>5__94 = this.<>7__wrap95.Current; 
  25.                         if (!this.<set>5__93.Add(this.<element>5__94)) 
  26.                         { 
  27.                             continue
  28.                         } 
  29.                         this.<>2__current = this.<element>5__94; 
  30.                         this.<>1__state = 3; 
  31.                         return true
  32.                     Label_00BA: 
  33.                         this.<>1__state = 2; 
  34.                     } 
  35.                     this.<>m__Finally96(); 
  36.                     break
  37.  
  38.                 case 3: 
  39.                     goto Label_00BA; 
  40.             } 
  41.             return false
  42.         } 
  43.         fault 
  44.         { 
  45.             this.System.IDisposable.Dispose(); 
  46.         } 
  47.     } 
  48.  
  49. ........ 
  50.  
Aha.. the foreach over the IEnumerable<T> is materializing the the objects underneath since it represents a query instead of a collection such as List<T>.These objects are not the same as the ones which "will be materialized" when the users variable will be iterated over and hence the object references are different and hence the discrepancy.

MSDN mentions -"If you want to compare sequences of objects of some custom data type, you have to implement the IEqualityComparer<(Of <(T>)>) generic interface in your class." which would work too. So in the example above if we create a class which implements IEqualityComparer and pass an instance of that class as a parameter to Except then we would get the intended behavior without needing to call ToList().
I prefer this over calling ToList() due to performance reasons especially if the query could return a large number of objects and also if you wanted to have explicit control over what equality means.

The JetBrains implementation works fine for this purpose.

We can type "T" to be IUser instead of a concrete class and we should be in good shape.

UPDATE(4/7/09) Mike Taulty pointed out that he would have written the above code as follows instead of using the Except operator
  1. var users = 
  2.       from u in doc.DescendantsAndSelf("User"
  3.       let id = (int)u.Attribute("id"
  4.       where localList.SingleOrDefault(iu => iu.ID == id) == null 
  5.       select new User() 
  6.       { 
  7.         ID = (int)u.Attribute("id"), 
  8.         Name = (string)u.Attribute("name"
  9.       }; 
This is wayyy more efficient than what we had before since no temporary User objects are created. The XElement objects are used as is to retrieve the ID field. The IDs are then compared against the localList of users and the SingleOrDefault(...) == null ensures that we get only the XElements corresponding to the users that do not exist in the localList of users. Slick! and way more performant than the original code.

kick it on DotNetKicks.com

Print | posted @ Saturday, March 28, 2009 4:23 PM

Comments on this entry:

No comments posted yet.

Post A Comment
Title:
Name:
Email:
Comment:
Verification: