Here is an interesting issue I noticed when using the Except
extension method.
I have list of users from which I want to exclude some users:
The list of users is coming from an XML file:
- <Users>
- <User id="1" name="Jack"/>
- <User id="2" name="Jim"/>
- <User id="3" name="Joe"/>
- <User id="4" name="James"/>
- <User id="5" name="Tom"/>
- <User id="6" name="Matt"/>
- <User id="7" name="Jon"/>
- <User id="8" name="Jill"/>
-
- </Users>
The code goes like this:
- interface IUser
- {
- int ID { get; set; }
- string Name { get; set; }
- }
-
- class User: IUser
- {
-
- #region IUser Members
-
- public int ID
- {
- get;
- set;
- }
-
- public string Name
- {
- get;
- set;
- }
-
- #endregion
-
- public override string ToString()
- {
- return ID + ":" +Name;
- }
-
-
- public static IEnumerable<IUser> GetMatchingUsers(IEnumerable<IUser> users)
- {
- IEnumerable<IUser> localList = new List<User>
- {
- new User{ ID=4, Name="James"},
- new User{ ID=5, Name="Tom"}
-
- }.OfType<IUser>();
- var matches = from u in users
- join lu in localList
- on u.ID equals lu.ID
- select u;
- return matches;
- }
- }
-
- class Program
- {
- static void Main(string[] args)
- {
- XDocument doc = XDocument.Load("Users.xml");
- IEnumerable<IUser> users = doc.Element("Users").Elements("User").Select
- (u => new User
- { ID = (int)u.Attribute("id"),
- Name = (string)u.Attribute("name")
- }
- ).OfType<IUser>();
-
-
- var matches = User.GetMatchingUsers(users);
- var excludes = users.Except(matches);
-
- }
- }
When I call User.GetMatchingUsers(users)
I get 2 matches as expected. The issue is that when I call users.Except(matches)
, the matching users are not being excluded at all!
I am expecting 6 users but excludes
variable contains all 8 users instead.
Since all I'm doing in GetMatchingUsers(IEnumerable<IUser> users)
is taking the IEnumerable<IUser> and just returning the IUsers whose ID's match( 2 IUsers in this case), my understanding is that by default Except
will use reference equality for comparing the objects to be excluded. Is this not how Except
behaves?
What is even more interesting is that if I materialize the objects using .ToList() and then get the matching users, and call Except
, everything works as expected!
Like so:
- IEnumerable<IUser> users = doc.Element("Users").Elements("User").Select
- (u => new User
- { ID = (int)u.Attribute("id"),
- Name = (string)u.Attribute("name")
- }
- ).OfType<IUser>().ToList();
-
- var matches = User.GetMatchingUsers(users);
- var excludes = users.Except(matches);
-
I don't see why I should need to materialize objects for calling Except
given that it's defined on IEnumerable<T>?
Jeff Yates helped provide some insight into why this is so.
While we are at it, let's see first hand as to why Except
, and other extension methods(especially those related to set operations) behave as they do by viewing Except
in Reflector
- public static IEnumerable<TSource> Except<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second)
- {
- if (first == null)
- {
- throw Error.ArgumentNull("first");
- }
- if (second == null)
- {
- throw Error.ArgumentNull("second");
- }
- return ExceptIterator<TSource>(first, second, null);
- }
-
-
- private static IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
- {
- <ExceptIterator>d__92<TSource> d__ = new <ExceptIterator>d__92<TSource>(-2);
- d__.<>3__first = first;
- d__.<>3__second = second;
- d__.<>3__comparer = comparer;
- return d__;
- }
-
-
- [CompilerGenerated]
- private sealed class <ExceptIterator>d__92<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable
- {
-
-
- .....
-
- private bool MoveNext()
- {
- try
- {
- switch (this.<>1__state)
- {
- case 0:
- this.<>1__state = -1;
- this.<set>5__93 = new Set<TSource>(this.comparer);
- foreach (TSource local in this.second)
- {
- this.<set>5__93.Add(local);
- }
- this.<>7__wrap95 = this.first.GetEnumerator();
- this.<>1__state = 2;
- while (this.<>7__wrap95.MoveNext())
- {
- this.<element>5__94 = this.<>7__wrap95.Current;
- if (!this.<set>5__93.Add(this.<element>5__94))
- {
- continue;
- }
- this.<>2__current = this.<element>5__94;
- this.<>1__state = 3;
- return true;
- Label_00BA:
- this.<>1__state = 2;
- }
- this.<>m__Finally96();
- break;
-
- case 3:
- goto Label_00BA;
- }
- return false;
- }
- fault
- {
- this.System.IDisposable.Dispose();
- }
- }
-
- ........
-
- }
Aha.. the foreach over the IEnumerable<T> is materializing the the objects underneath since it represents a query instead of a collection such as List<T>.These objects are not the same as the ones which "will be materialized" when the users variable will be iterated over and hence the object references are different and hence the discrepancy.
MSDN mentions -"If you want to compare sequences of objects of some custom data type, you have to implement the IEqualityComparer<(Of <(T>)>) generic interface in your class." which would work too. So in the example above if we create a class which implements IEqualityComparer and pass an instance of that class as a parameter to Except
then we would get the intended behavior without needing to call ToList().
I prefer this over calling ToList() due to performance reasons especially if the query could return a large number of objects and also if you wanted to have explicit control over what equality means.
The JetBrains implementation works fine for this purpose.
We can type "T" to be IUser instead of a concrete class and we should be in good shape.
UPDATE(4/7/09) Mike Taulty pointed out that he would have written the above code as follows instead of using the
Except
operator
- var users =
- from u in doc.DescendantsAndSelf("User")
- let id = (int)u.Attribute("id")
- where localList.SingleOrDefault(iu => iu.ID == id) == null
- select new User()
- {
- ID = (int)u.Attribute("id"),
- Name = (string)u.Attribute("name")
- };
This is wayyy more efficient than what we had before since no temporary
User
objects are created. The XElement objects are used as is to retrieve the ID field. The IDs are then compared against the localList of users and the SingleOrDefault(...) == null ensures that we get only the XElements corresponding to the users that do not exist in the localList of users. Slick! and way more performant than the original code.