This post is about some work I did in 2006 regarding the Hirschberg Global Alignment algorithm. The main reason I'm blogging about this is because, even though I do not work in the field of bioinformatics, and I am far from being a proper bioinformatician, I'm curious to know if C# (and .NET or mono) is used by others in the field of bioinformatics. So, if you read this, drop me a line!
What is the Hirschberg Global Alignment algorithm?
Some descriptions can be found at the following sites:
What can the algorithm do?
The algorithm allows for finding the best alignment of sequences, such as DNA or protein sequences. More importantly, it can do it in n space, rather than in n2 space, which obviously is important for long sequences. Some examples, that the implementation, referred to below, produces:
ATG-
-TGA
ACGCTG-A
-CGCTGAA
TTTTG----GGG
TTTTAAAAGGGG
GTCGGGA-GACC-TTA----GGACGT
AT--G-ATGACCCTTAAAAAG--C-C
Why did I do it?
In September 2006, I started an academic course on "Algorithms for Genomes" (A4G) at the IBIVU Amsterdam (http://www.ibi.vu.nl/teaching/a4g/). Part of this excellent course consisted of implementing the Hirschberg Global Alignment algorithm. As I had experience with C#, I chose to implement it in C# .NET. Afterwards, I got laughed at by my unix oriented bioinformatics teacher for choosing C#. I never got over it. :)
In order to get my implementation evaluated for grading, I had to make sure it ran on the mono framework. I used the 1.1.12 version at the time, but I haven't checked since if the code is still compatible with the latest version of the mono framework.
Where can you find the code?
http://www.codeplex.com/HirschbergCSharp.