Szymon Kobalczyk's Blog

A Developer's Notebook

  Home  |   Contact  |   Syndication    |   Login
  106 Posts | 6 Stories | 572 Comments | 365 Trackbacks

News

View Szymon Kobalczyk's profile on LinkedIn

Twitter












Tag Cloud


Article Categories

Archives

Post Categories

Blogs I Read

Tools I Use

About two weeks ago Daniel Biesiada (who is ISV DE here in Poland) announced on his blog a little programming contest. The goal was to build a .NET application that would check if the the theory of Six Degrees of separation applies to two given topics in Wikipedia. In order words to find a path from the source page to destination with no more then six links. At the time I had not much else to do (apart from setting up website for the C2C Conference, helping out with the European Silverlight Challenge, and preparing for the WPF Beta Exam) so I decided to give it a try.

Fast forward two weeks and I present you my WikiSpider:

image

As usual building this took me much more time than I initially anticipated (including few sleepless nights). And still I didn't make it before the deadline, so this even didn't count as a contest entry anymore (sigh!). However this was mainly because my personal goal was to throw in there every new piece of .NET 3.5 I could find fit - and most of them I never used before.

Here are some key technologies I managed to put into this:

  • The UI is done in WPF (and this was the only thing here I knew a bit about). However I borrowed the graph control from the excellent Kevin's WPF Bag-o-Tricks.
  • The caching is done using SQL Server Express. Initially I wanted to do this using SQL Compact but I run into performance issues and had to switch to full SQL in order to run the queries in profiler. But since this was fixed (with big help from Pawel Potasinski) I could try with SQL Compact again.
  • Of course data-access is done using LINQ to SQL. And of course this was the main source of my problems, as it was first time I've done anything in it, and so far I only read the Scott Gu's tutorials. Still, I'm already in love with it.
  • Speaking of LINQ. Initially we were screen scrapping the HTML pages to get all the links.  But turns out that Wikipedia has a little known about Query API that enables to get the page content in XML. So the obvious move was to rewrite this part with LINQ to XML.
  • The path-finding algorithm was borrowed from Eric Lippert. The nice thing about it is that it uses lots of C# 3.0 language features, so it is a great resource to learn from. The new C# syntax is so addictive that I already miss it in my other project.
  • Finally, I wanted to publish the app with ClickOnce but run out of time. So maybe later.

I learned many interesting things and tried out some new stuff that I wanted to check out anyway. I will try to share my discoveries in the next few days, but in the meantime feel free to download and take a look at may code (I know it's not prettiest piece of code you've seen but I was in a rush to finish this on time):

Download the source code

Here you can also download the entries from other participants: Lukasz Sowa, Maciej Rutkowski, and Arkadiusz Benedykt. Congratulations to all of you!

Installation

  1. Download the code from the above link and extract it.
  2. The application uses local SQL database for caching and unfortunately you need to create it yourself (now you know why I wanted to use SQL Compact). Simply launch SSMS and create empty database called WikiCache.
  3. Run the Create_WikiCacheDB.sql script from the data folder to create the database schema.
  4. By default the app is configured to look for the WikiCache database on the local SQLEXPRESS instance. If you installed it somewhere else update the connection string in app.config accordingly.
  5. Run the build.bat or open solution in Visual Studio 2008 and run from there.

Usage

  1. Enter the name of the Wikipedia page in the address bar at the top and press the Go! button. The entered topic and the pages it links to will be displayed as graph.
  2. Clicking on any topic will make it currently selected (put it in the center of the graph).
  3. Right-click on any topic to open the context menu. Select "Open in browser" to.... load the page in browser.
  4. Select "Set as source" or "Set as destination" to put the topic name in appropriate field on the sidebar
    [Note: Currently it's the only way to show the sidebar]
  5. You can also enter the source/destination topics manually.
  6. When both are set click on the Start button to begin searching for the path. Few statistics are displayed on the bottom of the sidebar.
  7. During the search you can still use the graph or navigate to other pages (thanks to the BackgroundWorker magic).
  8. When path is found it is displayed on the sidebar, and you can click on each topic to center it on graph.

Have fun!

posted on Wednesday, January 30, 2008 10:22 PM

Feedback

# re: Introducing WikiSpider 1/31/2008 11:59 AM Pawel Potasinski
Nice application. It was a pleasure to catch it as one of the first testers :-)

# re: Introducing WikiSpider 8/31/2008 8:06 PM Jay
Hey Szymon, have you by any chance updated the code to work with the new API from Wikipedia? The query API you use seems to be obsolete.

Kind regards

# re: Introducing WikiSpider 6/20/2010 8:27 AM knowing download megaupload
Very informative and useful information here. I have just been searching for some information about the WikiSpider and accidentally I have noticed this publication. Well, reading this your post I have known so many new facts about it, which I have not known before. All information is really good explained. Thanks a lot one more time for the great entry and keep up publishing it in the future. Regards.

Post A Comment
Title:
Name:
Email:
Comment:
Verification: