posts - 6, comments - 3, trackbacks - 0

My Links

News

Archives

Tuesday, October 19, 2010

HOWTO: Remove files from a git repository which should be ignored

In the last year or so I’ve discovered the wonders of GIT for managing source code and we’ve now moved the majority of our live code base in to repositories at work.

However, as with most learning curves, you make some mistakes at first. One thing that’s been really bugging me is that I didn’t know about the ".gitignore" file when creating my first few repositories. This means that all the binary files were also initially added. As soon as I learnt about the ignore file I went back and added it. Problem is, the git ignore file only limits finding new files to add, if the file is already being tracked then the ignore file has no effect. Have tried to find solutions, but most of the stuff seems to refer to "filter-branch" which to be honest I don’t fully understand.

So, here’s my:

Super Simple Method of Removing Files/Folders that should be ignored

Example: you wanted to ignore the "bin" folder from your project.

  • Make sure you have the correct line in your git ignore: e.g.
    [Bb]in*/
    (This will match any folder starting with the word "Bin" or "bin")
  • Move the file or folder that’s currently being tracked to somewhere that it won’t get committed.
    Either move it out of the folder completely (we’ll put it back later) or just rename it to something else that will get ignored e.g. renaming "bin" to "bin_" would work for me.
  • Commit the changes (check it contains removes for each of the files you should be ignoring, perhaps the updated ".gitignore" file, but make sure it isn’t re-adding the files from your temporary directory or filename!)
  • Now simply rename your files or folders back and now it won’t be appearing.

Perhaps there’s some kind of smarter way of doing this, but I just find this method really easy to understand! It certainly made my day when I finally got rid of these files!

Posted On Tuesday, October 19, 2010 12:08 PM | Feedback (0) |

Saturday, October 09, 2010

Recursively searching controls in ASP.NET web forms using generics, lambda expressions and extension methods!

Extension methods were introduced with the .NET 3.5 framework as a mechanism to add methods to extend existing types without modifying the original assembly. This is how the Linq methods were implemented to enable some very powerfull predicate function based operations to be performed over all existing collection types.

Searching for web controls on a page is one of those tasks that seems to come up for all kinds of reason while programming using web forms. I was reminded of this problem recently:

I'm personally favoring the MVC framework now, however, while at work the other day one of my collegues was working through an old web forms project of mine where a variable number of checkbox controls were being rendered in two separate lists on the page. Them, on post-back he needed to get a list of the checkboxes that were now checked.

Lambda expressions combined with recursive calls are a very powerful way of seaching through a pages controls. Originally I just used a simple funciton defined in the code-behind, however a much cleaner and reusable method would be to define the functions on the control class its self.

That's where extension methods come in. The code below shows a nice simple example of a couple of useful control search functions which then appear inside any object inheriting from System.Web.UI.Control.

To define extension methods on an exising class in C# you would do the following:

Firstly create a public static class with whatever name you like in whatever namespace you like (I've defined mine in DanielBradley.WebControlExtensions.ControlExtensions).

Secondly, define a public static (shared) function where you specify the first parameter with the keyword "this" and it's type as the type of the class you want to extend.

Finally, define everything else exactly how you would normally, with the correct return type, your other parameters and you simply look at your first parameter as the current object the functions is running in.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Web.UI;
 
namespace DanielBradley.WebControlExtensions
{
    public static class ControlExtensions
    {
        public static Control FirstOrDefault<TSource>(this Control ctrl, Func<TSource, bool> predicate) where TSource : Control
        {
            Type targetType = typeof(TSource);
            foreach (Control c in ctrl.Controls)
            {
                if (c.GetType() == targetType && predicate((TSource)c))
                {
                    return c;
                }
                Control recMatch = c.FirstOrDefault<TSource>(predicate);
                if (recMatch != null)
                {
                    return recMatch;
                }
            }
            return null;
        }
 
        public static IEnumerable<TSource> FindRecursive<TSource>(this Control ctrl, Func<TSource, bool> predicate) where TSource : Control
        {
            if (ctrl == null || ctrl.Controls.Count == 0)
                return new List<TSource>();
 
            return ctrl.Controls.OfType<TSource>().Where(predicate).Union(
                ctrl.Controls.Cast<Control>().SelectMany(c => c.FindRecursive<TSource>(predicate)));
        }
 
        public static IEnumerable<TSource> FindRecursive<TSource>(this Control ctrl, Func<TSource, bool> predicate, int depthLimit) where TSource : Control
        {
            if (ctrl == null || ctrl.Controls.Count == 0)
                return new List<TSource>();
 
            if (depthLimit == 0)
            {
                return ctrl.Controls.OfType<TSource>().Where(predicate);
            }
            else
            {
                return ctrl.Controls.OfType<TSource>().Where(predicate).Union(
                    ctrl.Controls.Cast<Control>().SelectMany(c => c.FindRecursive<TSource>(predicate, depthLimit - 1)));
            }
        }
    }
}
 

Extension methods in C# are implemented as a specific language feature, however, you can also implement extension methods in VB.NET through use of attributes:

Firstly import System.Runtime.CompilerServices.

Secondly define your (public) function, add the Extension attribute and define a first parameter as the type you want to add the function to (into which the object your function is running on is passed in to).

Here's the equivilant VB code (actually re-written not auto-converted :)

Imports System.Runtime.CompilerServices
Imports System.Web.UI
 
Public Module ControlExtensions
 
    <Extension()> _
    Public Function FirstOrDefault(Of TSource As Control)(ByVal ctrl As Control, ByVal predicate As Func(Of TSource, Boolean))
        Dim targetType = GetType(TSource)
        For Each c As Control In ctrl.Controls
            If c.GetType.Equals(targetType) AndAlso predicate(c) Then
                Return c
            End If
            Dim recMatch = c.FirstOrDefault(predicate)
            If recMatch IsNot Nothing Then
                Return recMatch
            End If
        Next
        Return Nothing
    End Function

    <Extension()> _
    Public Function FindRecursive(Of TSource As Control)(ByVal ctrl As Control, ByVal predicate As Func(Of TSource, Boolean))
        If ctrl Is Nothing OrElse ctrl.Controls.Count = 0 Then Return New List(Of TSource)
 
        Return ctrl.Controls.OfType(Of TSource).Where(predicate).Union( _
        ctrl.Controls.Cast(Of Control).SelectMany(Of TSource)(Function(c) c.FindRecursive(predicate)))
    End Function
 
    <Extension()> _
    Public Function FindRecursive(Of TSource As Control)(ByVal ctrl As Control, ByVal predicate As Func(Of TSource, Boolean), ByVal depthLimit As Integer)
        If ctrl Is Nothing OrElse ctrl.Controls.Count = 0 Then Return New List(Of TSource)
 
        If depthLimit = 0 Then
            Return ctrl.Controls.OfType(Of TSource).Where(predicate)
        Else
            Return ctrl.Controls.OfType(Of TSource).Where(predicate).Union( _
            ctrl.Controls.Cast(Of Control).SelectMany(Of TSource)(Function(c) c.FindRecursive(predicate, depthLimit - 1)))
        End If
    End Function
 
End Module

Having written these examples there's loads of ideas of useful stuff that's springing to mind that you could extend from here:

  • Make the predicate optional.
  • Write some generic tree search implementations on the IEnumerable interface.
  • Any other recursive based algorithms that could be useful on sets?

Anyway, there's the post, hope this is useful - might perhaps pad this out a little then move it into a project on github or something if it's potentially useful to people.

Daniel

Posted On Saturday, October 09, 2010 4:48 PM | Feedback (1) |

Monday, September 27, 2010

C# file synchronisation with Amazon S3 buckets tutorial

Use Case

Just started working with amazon's S3 buckets to hold a centralised filesystem to support a distributed workflow system. When the tasks in the workflow run on different physical machines in a viariety of locations so it's we need efficient ways of syncronising just small sub-sections of local files with a bucket.

The Plan

Amazons API allows listing objects by a key prefix i.e. search for all the files in a particular folder or sub-folders. This is a great way of syncronising folders where they might contain sub-folders, however we need to also list the same files from the local file system.

The second task is then comparing files, I our system the synronisation is only performed in one direction at a time (pull or push) and therefore we can calculated which files have been:

  • created (if it dosen't exist on the destination)
  • deleted (if it dosen't exist on the source)
  • modified (if the md5 of the local file doesn't match the etag on amazon)

Implementation

Get the current amazon file list

I'm using amazon's own .NET API for this example. The first task is to request all the objects within a particular folder. First we create the S3 client:

AmazonS3Client client = new AmazonS3Client("awsAccessKeyId", "awsSecretAccessKey");

Then we get all the files (S3 objects) under the desired folder using a ListObjectsRequest and getting the keys and their corresponding etags out into a dictionary for later:

ListObjectsResponse folderObjects = client.ListObjects(new ListObjectsRequest() { BucketName = "dbradley-test-bucket", Prefix = "test/folder" });
Dictionary<string, string> remoteObjects = folderObjects.S3Objects.ToDictionary(obj => obj.Key, obj => obj.ETag);

Get the current local file list

To get the local files in a similar format takes a little more work as filesystems don't naturally let you recursively get the files and paths for all sub folders. The approach to implement this behaviour is therefore going to be to implement a recursive function to dig down into all the sub directories.

The output of this funciton needs to be something that's comparible with the previous result from the amazon bucket - a dictionary mapping the file path to its MD5 hash.

The first step is to be able to generate an "amazon compatible" checksum of a file. We can use the ComputeHash function of the MD5CryptoServiceProvider class. This can be simply passed an stream and will return the hash as a byte array. However, to make this bit array into a hex encoded string we use the BitConverter ToString method, then simply strip the dashes and lower the case so that it will match the etag returned by amazon.

Note: There's probably a more efficient method of doing the conversion from byte array to hex, but this will do for now!

Therefore the hashing function looks something like:

string hash = BitConverter.ToString(crypto.ComputeHash(fileStream)).Replace("-", string.Empty).ToLower();

The next consideration is the time it takes to calculate these hashes. Even the most efficient of MD5 implementation introduce a significant cost to calculate, especially with big files. Therefore, rather than returning a dictionary of file paths mapping to the actual string MD5 hash we will actually return the paths mapping to a function which, only when run, will return the MD5 hash of the given file. We can define this using a delegate function which doesn't take an input:

delegate
{
   
using (var stream = file.OpenRead())
   
{
       
return BitConverter.ToString(crypto.ComputeHash(stream)).Replace("-", string.Empty).ToLower();
   
}
}

Going back to the recursive function, we need to make sure that the file keys match with those on amazon. Amazon paths looks somthing like "test/folder/file.txt" and therefore we need to make all of our local paths relative to a specific folder. Therefore we will define two root functions for simplicity:

  1. Get all the files within a directory (and assume that the given directory is the root directory in amazon).
  2. Get all the files within a directory and specify the current directories path on amazon.

Each of these funcitons will then call the internal recursive method. This internal method then simply returns the keys and hash functions of each file in it's current directory combinded with the keys and hash functions of each of it's sub-directories.

Bringing it all together.

So, finally here's the code to get a local directory as a set of amazon compatible paths mapping to an Amazon-compatible md5 hash.

public static Dictionary<string, Func<string>> GetLocalFileKeys(DirectoryInfo directory)
{    return GetLocalFileKeys(directory, string.Empty, new MD5CryptoServiceProvider()).ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
}
 

public
static Dictionary<string, Func<string>> GetLocalFileKeys(DirectoryInfo directory, string rootPath)
{
   
return GetLocalFileKeys(directory, rootPath, new MD5CryptoServiceProvider()).ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
}


private
static IEnumerable<KeyValuePair<string, Func<string>>> GetLocalFileKeys(DirectoryInfo directory, string currentPath, MD5CryptoServiceProvider crypto)
{
   
if (directory == null)
       
throw new ArgumentNullException("directory", "directory is null.");
 

   
return directory.EnumerateFiles().Select
       
(
       
file =>
           
new KeyValuePair<string, Func<string>>
               
(
               
currentPath + "/" + file.Name,
                
delegate
               
{
                   
using (var stream = file.OpenRead())
                   
{
                       
return BitConverter.ToString(crypto.ComputeHash(stream)).Replace("-", string.Empty).ToLower();
                   
}
               
}
               
)
       
)
       
.Union
       
(
       
directory.EnumerateDirectories().SelectMany
       
(
       
childDir => GetLocalFileKeys(childDir, currentPath + childDir.Name + "/", crypto)
       
)
       
);
}

One observation of the internal function is that it is using IEnumerable of KeyValuePair rather than an actual dictionary. This is due to dictionaries not being able to add collections of new pairs at once (as we need to do this when calling the function recursively so that the results are presented in a flat collection).

Posted On Monday, September 27, 2010 9:43 AM | Feedback (1) |

Saturday, July 10, 2010

Redesigning an Information System – Part 2 – Requirements

In my first post I started by discussing the motivations for re-designing a large information system from scratch. In this post we’re going to get a little more into the practical steps you can take to ensure you’re actually implementing a system that will actually meet the requirements of the business.

The plus-side of re-implementing an existing system is that all the current requirements are already defined by the existing code, the down-side being that it can be completely in-comprehensible and there may be features or tools buried deep down in the code that only 1 person uses!

To go beyond replicating the existing system you have to get involved in the day-to-day use of the system. I personally had a fair amount of experience with the last system I was working on, however, there was still a plethora of feature that I had no idea about until I talked to the people using the system.

Some of the operations that people do may not be something implemented using code, rather the existing system may well involve managers with large spreadsheets to track performance and assign tasks to their teams. Also this may involve people creating workarounds for the current system, such as printing or writing out information they use on a regular basis because they can’t get at the information they need at the necessary times. Although to a software engineer these practices seem slow and counter-intuitive, however, there’s probably a very good practical reason why these things have developed which will mostly likely point back to failings in the existing system and these are very important pointers to take note of.

Be agile

When trying to extract requirements the most important aim is best summed up by the last line of the agile manifesto:

Responding to change over following a plan

However thoroughly you research what you’re trying to build you’ll never fully capture everyone’s needs first time, so be open to changing major parts of your system even late on in the project if required.

Posted On Saturday, July 10, 2010 5:09 PM | Feedback (0) |

Thursday, May 27, 2010

Redesigning an Information System - Part 1

Through the next few weeks or months I'd like to run a small series of articles sharing my experiences from the largest of the project I've worked on and explore some of the real-world problems I've come across and how we went about solving them. I'm afraid I can't give too many specifics on the project right now as it's not yet complete so you'll have to forgive me for being a little abstract in places!

To start with I'm going to run through a little of the background of the problem and the motivations to re-design from scratch. Then I'll work through the approaches taken to understanding the requirements, designing, implementing, testing and migrating to the new system.

Motivations for Re-designing a Large Information System

The system is one that's been in place for a number of years and was originally designed to do a significantly different one to what it's now being used for. This is mainly due to the product maturing as well as client requirements changing.

As with most information systems this one can be defined in four main areas of functionality:

  1. Input – adding information to the system
  2. Storage – persisting information in an efficient, searchable structure
  3. Output – delivering the information to the client
  4. Control – management of the process

There can be a variety of reasons to re-design an existing system; a few of our own turned out to be factors such as:

  • Overall system reliability
  • System response time
  • Failure isolation and recovery
  • Maintainability of code and information
  • General extensibility to solve future problem
  • Separation of business and product concerns
  • New or improved features

The factor that started the thought process was the desire to improve the way in which information was entered into the system. However, this alone was not the entire reason for deciding to redesign.

Business Drivers

Typically all software engineers would always prefer to do a project from scratch themselves. It generally means you don't have to deal with problems created by predecessors and you can create your own absolutely perfect solution. However, the reality of working within a business is that the bottom line comes down to return on investment. For a medium sized business such as mine there must be actual value able to be delivered within a reasonable timeframe for any work to be started. As a result, any long term project will generally take a lot of effort and consideration to be approved by those in charge and therefore it might be better to break down the project into more manageable chunks which allow more frequent deliverables and also value within a shorter timeframe.

As the only thing of concern was the methods for inputting information, this is where we started with requirements gathering and design. However knowing that there might be more to the problem and not limiting your design decisions before the requirements is key to finding the best solutions.

Posted On Thursday, May 27, 2010 4:38 PM | Feedback (0) |

Wednesday, May 19, 2010

The Start of a Blog

So, here's my new blog up and running, who am I and what am I planning to write here?

First off - here's a little about me:
I'm a recent graduate from university (coming up to a year ago since I finished) studying Software Engineering on a four year course where the third year was an industrial placement. During the industrial placement I went to work for a company called Adfero in a "Technical Consultant" role as well as a junior "Information Systems Developer". Once I completed my placement I went back to complete my final year but also continued in my developer role 2/3 days a week with the company.

Working part time while at uni always seems like a great idea until you get half way through the year. For me the problem was not so much having a lack of time, but rather a lack of interest in the course content having got a chance at working on real projects in a live environment. Most people who have been graduated a little while also find this - when looking back at uni work, it seem to be much more trivial from a problem solving point of view which I found to be true and I found key to uni work to actually be your ability to prove though how you talk about something that you comprehensively understand the basics.

After completing uni I then returned full time to Adfero purely in the developer role which is where I've now been for almost a year and have now also taken on the title of "Information Systems Architect" where I'm working on some of the more high level design problems within the products.

What I'm wanting to share on this blog is some of the interesting things I've learnt myself over the last year, the things they don't teach you in uni and pretty much anything else I find interesting! My personal favorite areas are text indexing, search and particularly good software engineering design - good design combined with good code makes the first step towards a well-written, maintainable piece of software.

Hopefully I'll also be able to share a few of the products I've worked on, the mistake I've made and the software problems I've inherited from previous developers and had to heavily re-factor.

Posted On Wednesday, May 19, 2010 7:34 PM | Feedback (1) |

Powered by: