May 2008 Entries

I re-learned something seemingly trivial today.  The == operator is not the same as Equals (more specifically, the default implementation of the == operator).

I overrode Equals for one of my classes and couldn't figure out why my code wasn't working.  Then I realized that == performs a reference comparison, not a value comparison.  I will have to keep this in mind any time I am comparing objects.  Generally, I will want to use the Equals method.

When I first encountered the problem, I decided to implement my own version of the == (and !=) operator.  The new implementation would perform a value comparison.  I felt uncomfortable with this and decided to do a bit of research.  Microsoft recommends you only override immutable types in this manner.  An immutable type is any type that cannot be changed after instantiation.  My class is not immutable.  So, I decided to take the == implementation out.

I think we easily can be confused because, many built in value types implement value comparison with the == operator.  And to add further complication, you have to consider if a string is "interned".  About the only time you are dealing with interned strings is when you are using literals.  If you are retrieving a string from a database, then it won't be interned.

Take a look at the NUnit test below.  I purposefully did not use AreEqual, AreSame, AreNotSame because I wanted to demonstrate more clearly how the .net operators and methods work.

   1: [Test]
   2: public void TestValueComparison()
   3: {
   4:     int a = 1;
   5:     int b = 1;
   6:     Assert.IsTrue( a == b );
   7:     Assert.IsTrue( a.Equals( b ) );
   8:     Assert.IsFalse( Object.ReferenceEquals( a, b ) );
   9:     //Interned strings    
  10:     string x = "test";
  11:     string y = "test";
  12:     Assert.IsTrue( x == y );
  13:     Assert.IsTrue( x.Equals( y ) );
  14:     Assert.IsTrue( Object.ReferenceEquals( x, y ) );
  15:     //Non-interned strings    
  16:     string p = new string( 'a', 3 );
  17:     string q = new string( 'a', 3 );
  18:     Assert.IsTrue( p == q );
  19:     Assert.IsTrue( p.Equals( q ) );
  20:     Assert.IsFalse( Object.ReferenceEquals( p, q ) );
  21:     //Manually interning the strings    
  22:     p = string.Intern( p );
  23:     q = string.Intern( q );
  24:     Assert.IsTrue( Object.ReferenceEquals( p, q ) );
  25: }

So, generally speaking, if you want to perform a value comparison of reference types, you want to use the Equals method.

Cheers.

Tags:

I had the opportunity today to write some more complex LINQ queries.

First, I started with a simple group by expression allowing me to subtotal some data for a particular key.  Certainly I could have done this in the database.  Many would argue that that the database is the expert at these sorts of things, so we should let the expert take care of it.  I have a couple of reasons for placing the group by in the LINQ query.  First, I don't have a lot of control over the data layer.  Second, I can unit test the group by code with simple NUnit tests, without ever hitting the database.  To me, the second reason is much more compelling.  I don't have enough data yet to draw any conclusions, but the concept is promising.  Instead of inserting test data into a few data tables, I can simply mock the data that is used as input to my LINQ query.

As, my requirements evolved, I discovered a simple group by would not suffice.  I needed to join two lists of data together.  I was able to replace the group by with a "group join" using the "into" keyword.  The join includes groups from the outer list for all of the entries of the inner list.  Sometimes this would result in an "empty" group.  I was able to eliminate the empty group by simply testing the count.

Here is what my query basically looks like in the end (some names have been changed to protect the innocent):

   1: var clientSummaryQuery = from client in _dataContext.AllClients
   2:                          join clientDiscount in discounts on client equals clientDiscount.Client into d
   3:                          join clientOrder in orders on client equals clientOrder.Client into o
   4:                          where o.Count() > 0
   5:                          orderby client.FullName
   6:                          select new
   7:                          {
   8:                              Client = client,
   9:                              TotalOrder = o.Sum( orderSummary => orderSummary.OrderAmount ),
  10:                              TotalDiscount = o.Sum( orderSummary => orderSummary.OrderAmount ) * d.Sum( discountSummary => discountSummary.DiscountPercentage ),
  11:                              NetOrder = o.Sum( orderSummary => orderSummary.OrderAmount ) * (1 - d.Sum( discountSummary => discountSummary.DiscountPercentage )),
  12:                          };

The query above is only slightly simpler than my production query, but you should be able to get the gist of what I am doing.  I tweaked the query above, so it could have a syntax error here or there.

I start with all clients in the system, then I join to two other lists.  In both cases the joins are "group joins" (acheived with the "into" keyword).  I include a where clause to ensure I only list those clients who have outstanding orders.  Finally, I use the Sum aggregate method on my groups and I'm done.

Again, the beauty of this is that I can mock what my data context gives me for AllClients and discounts and orders.  I can exercise different scenarios with some simple unit tests.  For example, I can include clients that have no discounts, a single discount or multiple discounts.  Similarly, I can start with a surplus of clients and verify that those with no orders are excluded.  Finally, I can verify that all of the orders for a client are summarized as expected.  All of this can be done without ever hitting the database.  I use Rhino Mock to mock my data.

By the way, if anyone knows of a better way to achieve what I have done here, please share.

Tags:
I cannot believe that I am the first person ever to encounter the following error:
MSB3095: Invalid argument. Illegal characters in path.


[Update] As Tom pointed out in the comments, this issue is directly related to the encoding of my *.refresh file.  Thanks Tom.

I searched Google and other search engines with no success.  I found "MSB3095: Invalid Argument", and I found "Illegal characters in path.", but never in the same place.

I did not actually resolve the issue; I was able to work around it.  I have a website project that references some local projects as well as a couple of third party libraries.  I use *.refresh files to guarantee I get the latest version of the libraries from my dependencies folder.

Recently, I put together a library of extension methods and added it to my dependencies folder.  I added a new *.refresh file to my website.  Everything compiles fine on my desktop.  When I committed the changes, the build server responded with the MSB3095 error and failed compilation.

I remembered that I am using the extensions library in other projects with a direct reference.  It turns out I didn't need the *.refresh file at all.  The reference to my new library was inferred.

After removing the *.refresh file, the build server compiled the website without issue.

I still feel a little uncertain about this resolution.  I am using other refresh files.  What was wrong with this one?  I checked the path inside it and there was no problem with it.  The only thing I can imagine (though I have not tested) is that it may have something to do with my library having extra dots (.) in the name.  e.g.  MyCompany.Extensions.dll.refresh

If I discover something new I will update this post.

Tags:

So I was about to create a dump of my subversion repository so I could split multiple projects into separate repositories.  But then I got to thinking.  Why do I want to create a dump?  Well, the only good reason I could come up with was revision history.  That reason wasn't good enough.

The source that I wanted to split out was some common/shared libraries.  Collaborating with a team member, we decided there were two compelling reasons why we don't need the history up to this point in time.  First, the code base is fairly small and young, but stable.  That is, we are confident that it does what it is supposed to (we have tests).  Second, the code has never been released.  There are no production products using this code (yet).  So, the need to go back to a "release" revision simply doesn't exist.

Bottom line:  Don't go through the trouble of svnadmin dump and svndumpfilter if you don't have to.

Last month was a busy month for me.  We deployed the first version (beta) of the product I have been working on over the last couple of months.  Meeting the deadline with all of the promised features (almost) was critical.  However, when you fix the time line and fix the feature set, then something else has got to give.  You guessed it, quality.

Quality is not just a concern over failures.  There is a level of quality in your successes as well.  So far, the feedback on the product has been very positive, so, at an initial glance it would seem that quality is quite good.  Humbly, I must admit that quality diminished in April.

How should quality be measured?  Is it simply in defects reported?  I think not.  Many times you will recognize the quality of the code when you start to add something new.  If you add something new and something mysteriously breaks, you have poor quality.  If hire someone new and it takes you an unreasonable amount of time to "explain" the code, then it is of bad quality.  I think the code that I wrote in April could quite easily fail in these scenarios.

How can you maintain high quality?  The simple answer is TDD.  Where did I go wrong last month?  Simply put, I got out of the TDD driver's seat.  How much of Red-Green-Refactor did I lose?  Pretty much all of it.  On occasion, I would write a test after writing the production code.  Also, I would refactor my code from time to time, but in the absence of tests (a major no-no).  I kept my "coverage" at an almost reasonable level ~85%.  But, considering it was hovering around 98% at the beginning of the month, I certainly ran off the road.

I could make excuses, like "I had a major deadline", or "the customer just wouldn't give on the requirements".  The fact of the matter is, I got lazy.  When considering a new feature, the first thing I should ask myself is, "How can I test this?"  I just wasn't disciplined.

I've been playing catch-up the last week or so by adding tests to boost coverage.  Perhaps I should have phrased that differently.  The point is not to boost coverage.  The point is to test what your product is supposed to do.  If you write the tests after writing the code, It is easy to fall into the trap of writing tests that exercise what you think the code is doing.  This is why it is so important to write the tests first.  Write a test based on your requirements, then make it pass.

Hopefully, I can get back on the road before I blow a tire and end up missing the next deadline.  I need to remember (and so should you), poor quality frequently hides from you in the short term only to throw you into a tail spin in the long term.

Tags: ,