Posts
203
Comments
1119
Trackbacks
51
December 2009 Entries
LINQ to XML with Hierarchical XML, Optional Elements, and Namespaces

Recently I had an interesting task to consume a particular XML document and populate a C# object from it.  The structure of the XML document looked roughly like this:

   1:  <root xmlns="http://www.w3.org/2005/Atom">
   2:    <entry>
   3:      <id>1</id>
   4:      <title>abc</title>
   5:      <content>
   6:        <div xmlns="http://www.w3.org/1999/xhtml">
   7:          <table>
   8:            <tr>
   9:              <td>Item1</td>
  10:              <td>111</td>
  11:            </tr>
  12:            <tr>
  13:              <td>Item2</td>
  14:              <td>222</td>
  15:            </tr>
  16:            <tr>
  17:              <td>Item3</td>
  18:              <td>333</td>
  19:            </tr>
  20:          </table>
  21:        </div>
  22:      </content>
  23:    </entry>
  24:    <entry>
  25:      <id>2</id>
  26:      <title>xyz</title>
  27:      <content>
  28:        <div xmlns="http://www.w3.org/1999/xhtml">
  29:          <table>
  30:            <tr>
  31:              <td>Item1</td>
  32:              <td>444</td>
  33:            </tr>
  34:            <tr>
  35:              <td>Item2</td>
  36:              <td>555</td>
  37:            </tr>
  38:          </table>
  39:        </div>
  40:      </content>
  41:    </entry>
  42:  </root>

The C# object to be populated looked like this:

   1:  public class Entry
   2:  {
   3:      public int Id { get; set; }
   4:      public string Title { get; set; }
   5:      public string Item1 { get; set; }
   6:      public string Item2 { get; set; }
   7:      public string Item3 { get; set; }
   8:  }

My initial inclination was that LINQ to XML was the correct approach to do this. I had a few interesting challenges:

  1. The XML was hierarchical and needed to be flattened
  2. The XML had multiple namespaces
  3. Some of the elements were optional (e.g., Item3)
  4. Needed to access sibling elements (e.g., in the first <entry> element, the name “Item1” has a value of “111” on line 9-10 above)

If you’re already experienced with LINQ to XML, this might be a relatively straightforward query to write.  However, if you’re not a expert, it’s often the *approach* that you take that most determines you success.

The first thing I did was to write a unit test so that when I started to code, I could also do a quick red/green check to see if I was getting closer. Although writing a unit test first is typically my default mindset, I realized pretty quickly that I needed even more efficiency (i.e., instantaneous feedback) since I was really just working on a single query.  So my next step was to fire up LinqPad. This is truly an awesome tool and if you’re not already using it, go download it right now (it’s free).  You can work with database connections (LINQ to SQL, EF), LINQ to Objects, WCF Data Services, and LINQ to XML. For LINQ to XML, put LinqPad in “C# Statements” mode rather than the default “C# Expressions.”

My first step was to remove the XML namespaces from the XML. If my query didn’t return any results, I wanted to make sure it was because something was wrong with the query and not a silly namespace issue. For my first iteration of the query, I didn’t want to have to worry about optional elements, so I left Item3 out. Also, I knew that I could get the sibling element with LINQ to XML by using the ElementsAfterSelf() method:

   1:  var entries = from item in xmlList.Descendants("entry")
   2:                select new 
   3:                {
   4:                    Id = item.Element("id").Value,
   5:                    Title = item.Element("title").Value,
   6:                    Item1 = item.Element("content").Element("div").Element("table").Elements("tr").Elements("td").First(x => x.Value == "Item1").ElementsAfterSelf().First().Value,
   7:                    Item2 = item.Element("content").Element("div").Element("table").Elements("tr").Elements("td").First(x => x.Value == "Item2").ElementsAfterSelf().First().Value
   8:                };

While that does work, it certainly was not very pretty (or efficient with the deeply nested xhtml).  A better approach would be to encapsulate each entry’s table rows into a variable via the “let” keyword:

   1:  var entries = from item in xmlList.Descendants("entry")
   2:                let rows = item.Element("content").Element("div").Element("table").Elements("tr").Elements("td")
   3:                select new 
   4:                {
   5:                    Id = item.Element("id").Value,
   6:                    Title = item.Element("title").Value,
   7:                    Item1 = rows.First(x => x.Value == "Item1").ElementsAfterSelf().First().Value,
   8:                    Item2 = rows.First(x => x.Value == "Item2").ElementsAfterSelf().First().Value
   9:                };

That’s a definite improvement since I now have a “rows” variable that encapsulates all of the <td> elements for just that entry.  Now that we’re confident the structure is in good order, we can put the namespaces back in:

   1:  XNamespace atomNs = "http://www.w3.org/2005/Atom";
   2:  XNamespace xhtmlNs = "http://www.w3.org/1999/xhtml";
   3:   
   4:  var entries = from item in xmlList.Descendants(atomNs + "entry")
   5:                let rows = item.Element(atomNs + "content").Element(xhtmlNs + "div").Element(xhtmlNs + "table").Elements(xhtmlNs + "tr").Elements(xhtmlNs + "td")
   6:                select new 
   7:                {
   8:                    Id = item.Element(atomNs + "id").Value,
   9:                    Title = item.Element(atomNs + "title").Value,
  10:                    Item1 = rows.First(x => x.Value == "Item1").ElementsAfterSelf().First().Value,
  11:                    Item2 = rows.First(x => x.Value == "Item2").ElementsAfterSelf().First().Value
  12:                };

At this point, the only thing left is dealing with the optional XML elements. Following this tip by John Papa, we can add the optional check for any item (Item2 and Item3 shown on line 11-12 below) like this:

   1:  XNamespace atomNs = "http://www.w3.org/2005/Atom";
   2:  XNamespace xhtmlNs = "http://www.w3.org/1999/xhtml";
   3:   
   4:  var entries = from item in xmlList.Descendants(atomNs + "entry")
   5:                let rows = item.Element(atomNs + "content").Element(xhtmlNs + "div").Element(xhtmlNs + "table").Elements(xhtmlNs + "tr").Elements(xhtmlNs + "td")
   6:                select new 
   7:                {
   8:                    Id = item.Element(atomNs + "id").Value,
   9:                    Title = item.Element(atomNs + "title").Value,
  10:                    Item1 = rows.First(x => x.Value == "Item1").ElementsAfterSelf().First().Value,
  11:                    Item2 = (rows.FirstOrDefault(x => x.Value == "Item2") == null ? null : rows.First(x => x.Value == "Item2").ElementsAfterSelf().First().Value),
  12:                    Item3 = (rows.FirstOrDefault(x => x.Value == "Item3") == null ? null : rows.First(x => x.Value == "Item3").ElementsAfterSelf().First().Value)
  13:                };

At this point, things are working pretty well, so we can put this back into Visual Studio and execute the original unit test which is now green. But we’re not quite done.  While this code does what we want, it’s not particularly succinct or DRY. It’s also not incredibly efficient given that we’re looking for the items multiple times (first to check for existence and then to get the value).  When you hit situations like this, you can always write your own extension methods quite easily. So we can write an extension method to get a sibling value but only if the item exists (so it can be an optional element):

   1:  public static string FindSiblingValue(this IEnumerable<XElement> elements, string tagName)
   2:  {
   3:      var label = elements.FirstOrDefault(x => x.Value == tagName);
   4:      return (label == null ? null : label.ElementsAfterSelf().First().Value);
   5:  }

Once we have that in place, we can now refactor our LINQ to XML query:

   1:  var entries = from item in xmlList.Descendants(atomNs + "entry")
   2:                let rows = item.Element(atomNs + "content").Element(xhtmlNs + "div").Element(xhtmlNs + "table").Elements(xhtmlNs + "tr").Elements(xhtmlNs + "td")
   3:                select new 
   4:                {
   5:                    Id = item.Element(atomNs + "id").Value,
   6:                    Title = item.Element(atomNs + "title").Value,
   7:                    Item1 = rows.FindSiblingValue("Item1"),
   8:                    Item2 = rows.FindSiblingValue("Item2"),
   9:                    Item3 = rows.FindSiblingValue("Item3")
  10:                };

Now the code is succinct and efficient.

When looking at the final result, it really just boils down to 10 lines of C# code. However, with situations like these where you have to address multiple things at once, the best approach is typically to break it down into the smaller sub-components and attack each one individually. It’s also key to have a good testing strategy up front with unit tests and “scratch pads” like LinqPad since the last thing you want to be doing is wasting your time hitting F5 all the time to see if your code runs correctly.

Posted On Tuesday, December 29, 2009 11:34 PM | Comments (2)
Comparing Entity Framework 4 New Stored Procedure Support to LINQ to SQL

The next version of Entity Framework has many new features, many of which are enabling it to catch up with features previously available in other frameworks like LINQ to SQL.  One of these new features is the updated stored procedure support.  In previous versions of EF, working with stored procedures was quite limited and really only usable with CRUD operations that were mapped to already defined entities.  With EF 4, you can start with your stored procedure and have the designer automatically generate return types.  Essentially it will “sense” the shape of the parameters and (if applicable) SELECT statement and generate types that match it or allow you to map it to an already existing type.  This is certainly a great new feature to be adding but, at the same time, these stored procedure features have been in LINQ to SQL since version 1.

Although the features are similar between EF 4 and LINQ to SQL, the implementation isn’t exactly the same.  Suppose we have a typical Contact object that looks like this:

   1:  public class Contact
   2:  {
   3:      public Contact()
   4:      {
   5:          this.Addresses = new List<Address>();
   6:      }
   7:   
   8:      public int ContactId { get; set; }
   9:      public string FirstName { get; set; }
  10:      public string LastName { get; set; }
  11:      public string Email { get; set; }
  12:      public string Title { get; set; }
  13:      public ICollection<Address> Addresses { get; set; }
  14:  }

and stored procedure to insert a new contact that takes all of these properties as parameters (with an output parameter for the ContactID since it’s a PK Identity).  If we use the typical edmx approach, we get generated code for this stored procedures that looks like this:

   1:  public int SaveContact(ObjectParameter contactID, global::System.String firstName, global::System.String lastName, global::System.String company, global::System.String title, global::System.String email)
   2:  {
   3:      ObjectParameter firstNameParameter;
   4:      if (firstName != null)
   5:      {
   6:          firstNameParameter = new ObjectParameter("FirstName", firstName);
   7:      }
   8:      else
   9:      {
  10:          firstNameParameter = new ObjectParameter("FirstName", typeof(global::System.String));
  11:      }
  12:   
  13:      ObjectParameter lastNameParameter;
  14:      if (lastName != null)
  15:      {
  16:          lastNameParameter = new ObjectParameter("LastName", lastName);
  17:      }
  18:      else
  19:      {
  20:          lastNameParameter = new ObjectParameter("LastName", typeof(global::System.String));
  21:      }
  22:   
  23:      ObjectParameter companyParameter;
  24:      if (company != null)
  25:      {
  26:          companyParameter = new ObjectParameter("Company", company);
  27:      }
  28:      else
  29:      {
  30:          companyParameter = new ObjectParameter("Company", typeof(global::System.String));
  31:      }
  32:   
  33:      ObjectParameter titleParameter;
  34:      if (title != null)
  35:      {
  36:          titleParameter = new ObjectParameter("Title", title);
  37:      }
  38:      else
  39:      {
  40:          titleParameter = new ObjectParameter("Title", typeof(global::System.String));
  41:      }
  42:   
  43:      ObjectParameter emailParameter;
  44:      if (email != null)
  45:      {
  46:          emailParameter = new ObjectParameter("Email", email);
  47:      }
  48:      else
  49:      {
  50:          emailParameter = new ObjectParameter("Email", typeof(global::System.String));
  51:      }
  52:   
  53:      return base.ExecuteFunction("SaveContact", contactID, firstNameParameter, lastNameParameter, companyParameter, titleParameter, emailParameter);
  54:  }

It’s good that’s generated code and we don’t have to worry about writing that ourselves but it’s not the prettiest code to look at.  Let’s take a look at the exact same stored procedure call in a LINQ to SQL data context:

   1:  [Function(Name = "dbo.SaveContact")]
   2:  public int SaveContact(ref Nullable<int> contactID, string firstName, string lastName, string company, string title, string email)
   3:  {
   4:      IExecuteResult result = this.ExecuteMethodCall(this, (MethodInfo)MethodInfo.GetCurrentMethod(), contactID, firstName, lastName, company, title, email);
   5:      contactID = (Nullable<int>)result.GetParameterValue(0);
   6:      return (int)result.ReturnValue;
   7:  }

The first most obvious difference is that’s 54 lines of code for EF 4 and 7 lines of code for LINQ to SQL.  Second, LINQ to SQL allows you to just pass in the primitive types to the built-in ExecuteMethodCall() method whereas EF 4 wants a list of ObjectParameter types for its ExecuteFunction() method.  Third, LINQ to SQL can reflect over the parameters to figure out, by name, which parameters map to which stored procedure parameters (LINQ to SQL also provides the [Parameter] attribute if the names or types differ and need to be explicitly mapped). EF, on the other hand, wants the string for the name explicitly set (e.g., “FirstName”, “LastName”, etc.). Also, EF 4 specifies the name of the stored procedure as a parameter to the ExecuteFunction() method, whereas LINQ to SQL specifies it in the [Function] attribute which decorates the method.  Finally, the last interesting difference that I see is in regards to the contact ID property.  That is the PK Identity column that gets assigned from the database on insert.  LINQ to SQL automatically exposes that as a ref parameter (which is an API that makes sense from a C# perspective) and assigns the values after the stored procedure has been invoked via the GetParameterValue() method.  EF 4, on the other hand, exposes that parameter as an actual ObjectParameter (rather than a “ref int”) and all the rest of the parameters as their natural primitives. So the responsibility is on the caller to create the ObjectParameter for the PK property but not for the other ones. This allows the caller to have a reference to the variable so that, in the case of output parameters, it can get at the new value that was just assigned by the database.

Certainly there are pros and cons of each approach. But I doubt I would be putting myself in the minority to say that I prefer the (much) more succinct API that LINQ to SQL provides. So, while I love the fact that EF 4 is now allowing me to call stored procedures in a similar way to what I had with LINQ to SQL, I’m not absolutely thrilled with the API.

But, digging a little deeper, how can I add some re-usable methods to be able to give my EF 4 a more succinct API?

The first thing I want to do is to get an easy way to create all of those ObjectParameters automatically from their primitive types without all the IF statements checking to see if each one is null. If I want to add on to the designer generated code, I can create this method in a partial class. Otherwise, if I’m using the “code only” approach, I can put this method in my own base ObjectContext class xxxxx:

   1:  public class SmartObjectContext : ObjectContext
   2:  {
   3:      protected ObjectParameter[] GetObjectParameters(MethodInfo methodInfo, params object[] parameters)
   4:      {
   5:          var objectParameters = new ObjectParameter[parameters.Length];
   6:   
   7:          var methodParams = methodInfo.GetParameters();
   8:          for (int i = 0; i < parameters.Length; i++)
   9:          {
  10:              var paramName = methodParams[i].Name;
  11:              var paramType = methodParams[i].ParameterType;
  12:              var paramValue = parameters[i];
  13:              if (paramValue == null)
  14:              {
  15:                  objectParameters[i] = new ObjectParameter(paramName, paramType);
  16:              }
  17:              else
  18:              {
  19:                  objectParameters[i] = new ObjectParameter(paramName, parameters[i]);
  20:              }
  21:          }
  22:          return objectParameters;
  23:      }
  24:  }

This method is clearly not as robust as the LINQ to SQL implementation which optionally takes into account parameter attributes if they exist to further customize the mapping, but it gets the job done.  This now allows my (previously 54 lines) SaveContact() method to now look like this:

   1:  public int SaveContact(ref int contactID, string firstName, string lastName, string company, string title, string email)
   2:  {
   3:      var objParams = this.GetObjectParameters((MethodInfo)MethodInfo.GetCurrentMethod(), contactID, firstName, lastName, company, title, email);
   4:      var result = this.ExecuteFunction("SaveContact", objParams);
   5:      contactID = (int)objParams.First(o => o.Name == "contactID").Value;
   6:      return result;
   7:  }

Just by adding this one method to a base class (or current class if you’re using a partial) I was able to reduce my SaveContact() method from 54 lines to 7 lines, and allow the API to be primitives for all parameters including a “ref int” for the PK Identity rather than an ObjectParameter.

I can also simplify this even further if I’m invoking stored procedures that are not assigning output parameters. For example, suppose you have a simple GetContact stored procedure which takes a single integer contactID as the parameter to the stored procedure. The auto-generated EF 4 code looks like this:

   1:  public ObjectResult<Contact> GetContact(Nullable<global::System.Int32> contactID)
   2:  {
   3:      ObjectParameter contactIDParameter;
   4:      if (contactID.HasValue)
   5:      {
   6:          contactIDParameter = new ObjectParameter("ContactID", contactID);
   7:      }
   8:      else
   9:      {
  10:          contactIDParameter = new ObjectParameter("ContactID", typeof(global::System.Int32));
  11:      }
  12:   
  13:      return base.ExecuteFunction<Contact>("GetContact", contactIDParameter);
  14:  }

Notice this is using a different ExecuteFunction() method that is generic and is strongly-typed to the Contact class in this instance.  I can create my own generic ExecuteFunction() method that encapsulates the creation of the ObjectParameter(s) and also put that in my base (or partial) ObjectContext:

   1:  public ObjectResult<T> ExecuteFunction<T>(string functionName, MethodInfo methodInfo, params object[] parameters)
   2:  {
   3:      var objParams = this.GetObjectParameters(methodInfo, parameters);
   4:      return base.ExecuteFunction<T>(functionName, objParams);
   5:  }

What this now allows me to do is to refactor my GetContact() method to now look like this:

   1:  public ObjectResult<Contact> GetContact(Nullable<int> contactID)
   2:  {
   3:      return this.ExecuteFunction<Contact>("GetContact", (MethodInfo)MethodInfo.GetCurrentMethod(), contactID);
   4:  }

Essentially this is now down to 1 meaningful line of code.

A couple of other things to keep in mind – first, the technique of these helper methods can be used regardless of whether you are using EF 4 in “data first”, “model first”, or “code only” scenarios.  Secondly, if you are using edmx files, EF 4 now allows you to customize the generated code that is produced via T4 templates.  So if you don’t like the code that the EF 4 designer is creating for you out of the box, change it!

One final note on EF 4 stored procedure support: apparently it does *not* support stored procedures that have multiple result sets – this also has been available since the first version of LINQ to SQL via IMultipleResults. I’m told this functionality can be added to EF with the EF Extensions library but I have yet to use it myself.

I’m still in the process of exploring all of the new features being added to EF 4. While some of them are quite interesting (e.g., fluent mappings, etc.), many of the other features being added have already been available in other frameworks like LINQ to SQL.  Definitely good to seem them being added to EF 4 now as well.

Posted On Wednesday, December 16, 2009 11:24 PM | Comments (7)

View Steve Michelotti's profile on LinkedIn

profile for Steve Michelotti at Stack Overflow, Q&A for professional and enthusiast programmers




Google My Blog

Tag Cloud