Posts
12
Comments
79
Trackbacks
0
LinqToWikipedia - A custom .NET Linq provider - Part II

This article is a two-part series regarding the LinqToWikipedia provider. The first article covers the basic concepts of Linq as well as the client usage of this particular provider while the second article covers the inner workings of the LinqToWikipedia provider to give you an understanding of what it takes to create your own IQueryable provider.

NOTE: You should download the latest build from Codeplex so you can follow along with the code samples.

Creating your own IQueryable provider

First off we have to ask ourselves the question..."Why would we want to create our own IQueryable provider?" It certainly isn't the easiest coding task that you'll undertake, the internet isn't exactly overflowing with end to end simple examples of how to accomplish this. What benefits do we really get out of doing this?

Well we already know the advantages of using Linq, we already covered that in my previous article; allowing for a consistent querying syntax, whether using Linq or Lambda expressions, no matter what the data source of the data gives us a pretty good first reason to build our own provider. Here are a couple more potential uses:

  • You have several applications in your organization that need to verify addresses. You could write a Linq provider that would translate queries to consume the United States Post Service API's to verify physical address locations.
  • Your organization has employee demographic data in various systems such as PeopleSoft, Active Directory, etc. You could write a Linq provider that allows applications to query and return employee data from these various systems in one result set.
  • You maintain websites for Real Estate agents and want to give potential customers the ability to look at the value of nearby homes in a certain area without leaving the agent's website. You could write a Linq provider that would connect to the Zillo.com API and return a wealth of information for the user to gauage the value of a potential home.

So we discussed some reasons why you may want to create your own IQueryable provider, let's walkthrough the LinqToWikipedia provider to show you how you can create you own provider.

NOTE: I am only going to show code highlights of important concepts, but you can always download the full source code from CodePlex to view the code in it's entirety. Additionally, to keep the examples simple I will only focus on the OpenSearch functionality of the provider

At a high level, there are two interfaces that you will need to implement in order to create your own IQueryable provider:

  • IQueryable<T>
    • We implement this interface so that our results will be able to be enumerated over
  • IQueryProvider
    • We implement this interface so that we can examine the query (expression tree) to perform any translations to create our actual query for whatever API we are calling into and then execute the query

Before we get into the actual provider code, we will fist start at the client call so we can follow a path through the execution process:

 

   1:  WikipediaContext datacontext = new WikipediaContext();
   2:   
   3:  var opensearch = (
   4:      from wikipedia in datacontext.OpenSearch
   5:      where wikipedia.Keyword == "Microsoft"
   6:      select wikipedia).Take(5);


Here you can see that we can created a new instance of the WikipediaContext class and then called to the datacontext.OpenSearch property. This allows the query to return as type IWikipediaQueryable<WikipediaOpenSearchResult> (this will be explained in the later sections). Then we are going to query the results to only inlude those where Keyword == "Microsoft". Finally, we are going to only want to return the first 5 records via the Take() extension method.

We have broken the query down so that you can see where everything more or less fits together as we review the code later.

Let's now look at the WikipediaContext class:

 

   1:  public class WikipediaContext
   2:  {
   3:      public IWikipediaQueryable<WikipediaOpenSearchResult> OpenSearch
   4:      {
   5:          get { return new WikipediaQueryable<WikipediaOpenSearchResult>(); }
   6:      }
   7:  }



Seems pretty simple. A class with one property OpenSearch that returns type IWikipediaQueryable<WikipediaOpenSearchResult>. In actuality, we are returning the concrete type WikipediaQueryable<WikipediaOpenSearchResult> that implements the IWikipediaQueryable<T> interface. This is important because that interface also implements IQueryable<T> which is what allows our provider to return itself as an IQueryable.

Up to this point we will be returning an enumerable list of WikipediaOpenSearchResult, which is the result class that we will be populating from the Wikipedia API.

 

   1:  public class WikipediaOpenSearchResult : IWikipediaOpenSearchResult
   2:  {
   3:      public string Text { get; set; }
   4:      public string Description { get; set; }
   5:      public Uri Url { get; set; }
   6:      public Uri ImageUrl { get; set; }
   7:      public string Keyword { get; set; }
   8:  }



Since we are implementing the IQueryable<T> interface, we are also implementing from IEnumerable<T>, IQueryable, and IEnumerable so we are obligated to provide implementations of the following:

  • IEnumerator IEnumerable.GetEnumerator()
  • IEnumerator<T> IEnumerable<T>.GetEnumerator()
  • Type IQueryable.ElementType
  • Expression IQueryable.Expression
  • IQueryProvider IQueryable.Provider


For the purposes of this tutorial we will only focus on the IEnumerator<T> IEnumerable<T>.GetEnumerator() method, this is where the acutal call to our custom query provider takes place.

Here is the WikipediaQueryable class:

 

   1:  internal sealed class WikipediaQueryable<T> : IWikipediaQueryable<T>
   2:  {
   3:      private Expression _expression;
   4:   
   5:      public WikipediaQueryable()
   6:      {
   7:          _expression = Expression.Constant(this);
   8:      }
   9:   
  10:      public WikipediaQueryable(Expression expression)
  11:      {
  12:          _expression = expression;
  13:      }
  14:   
  15:      IEnumerator<T> IEnumerable<T>.GetEnumerator()
  16:      {
  17:          WikipediaQueryProvider provider = new WikipediaQueryProvider();
  18:   
  19:          IEnumerable<T> sequence = provider.Execute<IEnumerable<T>>(_expression);
  20:   
  21:          return sequence.GetEnumerator();
  22:      }
  23:  }



Notice on lines 17 - 21 we are calling our customer query provider WikipediaQueryProvider. This is where the provider is passed the lambda expression via it's Execute method. Once the provider has translated and executed the query to the API, it will subsequently return an IEnuberable of type <T>, which is actually the WikipediaOpenSearchResult type in this case.

Since our custom query provider will implement IQueryProvider we must provide implementations for the following:

  • IQueryable CreateQuery(Expression expression)
  • IQueryable<TElement> CreateQuery<TElement>(Expression expression)
  • object Execute(Expression expression)
  • TResult Execute<TResult>(Expression expression)

For the purposes of this tutorial we will only focus on the object Execute(Expression expression) method, this is where the HTTP send and get request calls are made to the Wikipedia API.

Here is the WikipediaQueryProvider class:

 

   1:  internal sealed class WikipediaQueryProvider : IQueryProvider
   2:  {
   3:      public object Execute(Expression expression)
   4:      {
   5:          return WikipediaOpenSearchResponse.Get(WikipediaOpenSearchRequest.Send(expression));
   6:      }
   7:  }



Line 5 is returning a string result from the WikipediaOpenSearchRequest.Send method which is really just sending our formatted URL to the Wikipedia API and returning the results to the WikipediaOpenSearchResponse.Get method which parses the results and returns them as type IEnumerable<WikipediaOpenSearchResult>.

We'll cover the WikipediaOpenSearchResponse.Get method later, for now let's take a look at the WikipediaOpenSearchRequest.Send method.

 

   1:  internal static class WikipediaOpenSearchRequest
   2:  {
   3:      internal static string Send(Expression expression)
   4:      {
   5:          WikipediaOpenSearchUriBuilder uriBuilder = new WikipediaOpenSearchUriBuilder();
   6:   
   7:          Uri uri = uriBuilder.BuildUri(expression);
   8:   
   9:          return HttpRequest.Send(uri);
  10:      }
  11:  }



This static method is building a URI based on the expression and then utlizing a helper method HttpRequest.Send which sends the URI using HttpWebRequest. The WikipediaOpenSearchUriBuilder class contains the code that will parse the expression so that we can translate it into a usable URI.

NOTE: Learning to parse lambda expressions is a topic worthy of being covered in a book. You will see some basic samples below on parsing expressions however the focus of this article is to demostrate how to create a Linq provider. I would suggest further exploration on learning the ins and outs of expression parsing to ensure that you have a good solid foundation on lambda expressions. Here are a few good starting points:


Here is the WikipediaOpenSearchUriBuilder class.

 

   1:  internal sealed class WikipediaOpenSearchUriBuilder : ExpressionVisitor
   2:  {
   3:      private StringBuilder _urlBuilder;
   4:      private Dictionary<string, string> _queryString;
   5:      private string query = string.Empty;
   6:   
   7:      public WikipediaOpenSearchUriBuilder()
   8:      {
   9:          _queryString = new Dictionary<string, string>();
  10:          _queryString["format"] = "xml";
  11:          _queryString["action"] = "opensearch";
  12:   
  13:          _urlBuilder = new StringBuilder();
  14:   
  15:          _urlBuilder.Append("http://en.wikipedia.org/w/api.php?");
  16:      }
  17:   
  18:      public Uri BuildUri(Expression expression)
  19:      {
  20:          foreach (KeyValuePair<string, string> parameter in _queryString)
  21:          {
  22:              _urlBuilder.Append(parameter.Key);
  23:              _urlBuilder.Append("=");
  24:              _urlBuilder.Append(Uri.EscapeDataString(parameter.Value));
  25:              _urlBuilder.Append("&");
  26:          }
  27:   
  28:          // parse expression tree
  29:          Visit((MethodCallExpression)expression);
  30:   
  31:          return new Uri(_urlBuilder.ToString());
  32:      }
  33:   
  34:      // override ExpressionVisitor method
  35:      protected override Expression VisitMethodCall(MethodCallExpression m)
  36:      {
  37:          if ((m.Method.DeclaringType == typeof(Queryable)) || (m.Method.DeclaringType == typeof(Enumerable)))
  38:          {
  39:              if (m.Method.Name.Equals("Where"))
  40:              {
  41:                  LambdaExpression lambda = (LambdaExpression)StripQuotes(m.Arguments[1]);
  42:   
  43:                  ParseQuery(lambda.Body);
  44:   
  45:                  _urlBuilder.Append("search=" + query);
  46:   
  47:                  return m;
  48:              }
  49:              if (m.Method.Name.Equals("Take"))
  50:              {
  51:                  Visit(m.Arguments[0]);
  52:   
  53:                  ConstantExpression countExpression = (ConstantExpression)StripQuotes(m.Arguments[1]);
  54:   
  55:                  _urlBuilder.Append("&limit=" + ((int)countExpression.Value).ToString(CultureInfo.InvariantCulture));
  56:   
  57:                  return m;
  58:              }
  59:          }
  60:   
  61:          return m;
  62:      }
  63:   
  64:      internal void ParseQuery(Expression e)
  65:      {
  66:          if (e is BinaryExpression)
  67:          {
  68:              BinaryExpression c = e as BinaryExpression;
  69:   
  70:              switch (c.NodeType)
  71:              {
  72:                  case ExpressionType.Equal:
  73:                      GetCondition(c);
  74:                      break;
  75:                  default:
  76:                      throw new NotSupportedException("Only .Equal is supported for this query");
  77:              }
  78:          }
  79:          else
  80:              throw new NotSupportedException("This querytype is not supported.");
  81:      }
  82:   
  83:      internal void GetCondition(BinaryExpression e)
  84:      {
  85:          string val = string.Empty, attrib = string.Empty;
  86:   
  87:          if (e.Left is MemberExpression)
  88:          {
  89:              attrib = ((MemberExpression)e.Left).Member.Name;
  90:              val = Expression.Lambda(e.Right).Compile().DynamicInvoke().ToString();
  91:          }
  92:          else if (e.Right is MemberExpression)
  93:          {
  94:              attrib = ((MemberExpression)e.Right).Member.Name;
  95:              val = Expression.Lambda(e.Left).Compile().DynamicInvoke().ToString();
  96:          }
  97:   
  98:          if (attrib.Equals("Keyword"))
  99:          {
 100:              if (query.Length > 0)
 101:                  throw new NotSupportedException("'WikipediaOpenSearchResult' Query expression can only contain one 'Keyword'");
 102:              else
 103:                  query = val;
 104:          }
 105:          else
 106:          {
 107:              throw new NotSupportedException("'WikipediaOpenSearchResult' Query expression can only contain a'Keyword' parameter");
 108:          }
 109:      }
 110:  }



This class is doing most of the heavy lifting for our provider so let's break it down:

First, on line 1 one we are inheriting from the abstract class ExpressionVisitor which is a helper class that allows us to easily parse lambda expression trees. We will cover this a bit more later.

Next, on line 7 in our contructor...

 

_queryString = new Dictionary<string, string>();
_queryString["format"] = "xml";
_queryString["action"] = "opensearch";

br /> We begin the process of building our querystring variable that will be sent to the Wikipedia API. Throughout the class we will be appending new items to the querystring as we begin parsing the expression tree.

Next, on line 18 we define our method which then takes over in building our URL from the lamda expression...

public Uri BuildUri(Expression expression)

and the actual parsing of the expression tree starts on line 29:

Visit((MethodCallExpression)expression);

This method in the ExpressionVisitor class begins evaluating the expression and subsequently takes us back to our overridden method on line 35:

 

protected override Expression VisitMethodCall(MethodCallExpression m)



This method is where we can check the values contained in the expression, specifically in the Where() and Take() extension methods so that we can parse the expression and extract the values that we are looking for.

 

if (m.Method.Name.Equals("Where"))

and..

if (m.Method.Name.Equals("Take"))

Lines 39-48 are where we parse the lambda expression and look for the keyword that was supplied to the Where() extension method. This value is then appended to the querystring as the "search" value passed to the Wikipedia API.

 

_urlBuilder.Append("search=" + query);



Next on lines 49-58 we look for the value that was supplied to the Take() extension method. This value is then appended to the querystring as the "limit" value passed to the Wikipedia API.

 

_urlBuilder.Append("&limit=" + ((int)countExpression.Value).ToString(CultureInfo.InvariantCulture));



Recap:

So up to this point, we have translated the lambda expression into a URL that is ready to be passed to the Wikipedia API. We now need to get ready to accept the returned data (xml) from the Wikipedia API and then populate our WikipediaOpenSearchResult class with the data. So let's move on to the WikipediaOpenSearchResponse.Get method.

 

   1:  internal static class WikipediaOpenSearchResponse
   2:  {
   3:      internal static IEnumerable<WikipediaOpenSearchResult> Get(string xml)
   4:      {
   5:          List<WikipediaOpenSearchResult> resultList = new List<WikipediaOpenSearchResult>();
   6:   
   7:          var descendants = from i in XDocument.Parse(xml).Descendants() select i;
   8:   
   9:          foreach (XElement element in descendants)
  10:          {
  11:              if (element.Name.LocalName.ToString().Equals("Item"))
  12:              {
  13:                  WikipediaOpenSearchResult wsr = new WikipediaOpenSearchResult();
  14:   
  15:                  var items = from x in element.Nodes()
  16:                              select x;
  17:   
  18:                  foreach (XElement item in items)
  19:                  {
  20:                      switch (item.Name.LocalName.ToString())
  21:                      {
  22:                          case "Text":
  23:                              wsr.Text = item.Value;
  24:                              break;
  25:                          case "Description":
  26:                              wsr.Description = item.Value;
  27:                              break;
  28:                          case "Url":
  29:                              wsr.Url = new Uri(item.Value);
  30:                              break;
  31:                          case "Image":
  32:                              wsr.ImageUrl = new Uri(item.Attribute("source").Value);
  33:                              break;
  34:                      }
  35:                  }
  36:   
  37:                  resultList.Add(wsr);
  38:              }
  39:          }
  40:   
  41:          return resultList;
  42:      }
  43:  }



This class has one static method that accepts a string of xml and returns a a generic List<> of type WikipediaOpenSearchResult as an IEnumerable. We are using LinqtoXml to parse out the data and return the elements that we need to populate our WikipediaOpenSearchResult class.

From here the IEnumerable is passed back on up the stack to the WikipediaQueryProvider and returned to the client as type WikipediaQueryable<WikipediaKeywordSearchResult>.

posted on Friday, January 29, 2010 1:10 PM Print
Comments
Gravatar
# re: LinqToWikipedia - A custom .NET Linq provider - Part II
Christophe
2/13/2010 1:56 PM
Nice work, keep it up.
Gravatar
# re: LinqToWikipedia - A custom .NET Linq provider - Part II
vanderbilt
2/15/2010 11:01 PM
nice work. really appreciate
but, saddly wikipedia.org httpWebRequset return 403 Forbidden Error

live demo not work
Gravatar
# re: LinqToWikipedia - A custom .NET Linq provider - Part II
Steve
2/22/2010 5:58 AM
Hello,

I get the same problem. Wikipedia returns 403 access denied.

How can I bypass this problem?
Gravatar
# re: LinqToWikipedia - A custom .NET Linq provider - Part II
Michael Ballhaus
2/22/2010 9:32 AM
The error is caused because the Wikipedia API is thinking that you are coming in as a "bot". The fix that I have to add is a line of code an add a "UserAgent" header to the webrequest. You can either see the fix on codeplex and make the change yourself or I will update the release version with the fix.

Thanks for the feedback!
Gravatar
# re: LinqToWikipedia - A custom .NET Linq provider - Part II
adam
3/31/2012 7:03 AM
how can i get text from a specific url? e.g get first paragraph from http://en.wikipedia.org/wiki/Linq
Gravatar
# re: LinqToWikipedia - A custom .NET Linq provider - Part II
Oer
3/20/2013 6:52 PM
Hello,

Thank you for this work. It has helped me a lot. However, I do have a question: is it possible to obtain all the information from a page with a list of subjects, for instance: http://en.wikipedia.org/wiki/List_of_sports

I have been trying to change the parameters and even the whole WikipediaOpenSearchUriBuilder class but I was not able to get any info from the wikipedia sandbox and therefore, not sure how to do this.

Can anyone point me in the right direction? suggestions?

Thank you.
Gravatar
# re: LinqToWikipedia - A custom .NET Linq provider - Part II
Sheetal
11/27/2014 1:29 AM
Its working very nicely but open search API provides a text information in 2 lines. I need text of two paragraphs from wiki pages.

please help


Thanks
Gravatar
# re: LinqToWikipedia - A custom .NET Linq provider - Part II
Mohammad Mohit
12/3/2014 4:26 AM
Hi.
How can I use LinqToWkipedia for Farsi(Persian) word search?
Look like : سیب = Apple
Check out this AbiDic.com

Click on WikiPedia icon. See the result.
Pla Help Me.
Gravatar
# re: LinqToWikipedia - A custom .NET Linq provider - Part II
Nicole
6/23/2017 11:07 PM
Good work! It does help me a lot! Really appreciate! bullet force

Post Comment

Title *
Name *
Email
Comment *  
Verification
Coding strategies for the Java and .Net developer...