Tuesday, October 19, 2010
In the last year or so I’ve discovered the wonders of GIT for managing source code
and we’ve now moved the majority of our live code base in to repositories at work.
However, as with most learning curves, you make some mistakes at first. One thing
that’s been really bugging me is that I didn’t know about the ".gitignore" file
when creating my first few repositories. This means that all the binary files were
also initially added. As soon as I learnt about the ignore file I went back and
added it. Problem is, the git ignore file only limits finding new files to add,
if the file is already being tracked then the ignore file has no effect. Have tried
to find solutions, but most of the stuff seems to refer to "filter-branch" which
to be honest I don’t fully understand.
So, here’s my:
Super Simple Method of Removing Files/Folders that should be ignored
Example: you wanted to ignore the "bin" folder from your project.
- Make sure you have the correct line in your git ignore: e.g.
[Bb]in*/
(This will match any folder starting with the word "Bin" or "bin")
- Move the file or folder that’s currently being tracked to somewhere that it won’t
get committed.
Either move it out of the folder completely (we’ll put it back later) or just rename
it to something else that will get ignored e.g. renaming "bin" to "bin_" would work
for me.
- Commit the changes (check it contains removes for each of the files you should
be ignoring, perhaps the updated ".gitignore" file, but make sure it isn’t re-adding
the files from your temporary directory or filename!)
- Now simply rename your files or folders back and now it won’t be appearing.
Perhaps there’s some kind of smarter way of doing this, but I just find this method
really easy to understand! It certainly made my day when I finally got rid of these
files!
Saturday, October 09, 2010
Extension methods
were introduced with the .NET 3.5 framework as a mechanism to add methods to extend
existing types without modifying the original assembly. This is how the Linq methods
were implemented to enable some very powerfull predicate function based operations
to be performed over all existing collection types.
Searching for web controls on a page is one of those tasks that seems to come up
for all kinds of reason while programming using web forms. I was reminded of this
problem recently:
I'm personally favoring the MVC framework
now, however, while at work the other day one of my collegues was working through
an old web forms project of mine where a variable number of checkbox controls were
being rendered in two separate lists on the page. Them, on post-back he needed to
get a list of the checkboxes that were now checked.
Lambda expressions combined with recursive calls are a very powerful way of seaching
through a pages controls. Originally I just used a simple funciton defined in the
code-behind, however a much cleaner and reusable method would be to define the functions
on the control class its self.
That's where extension methods come in. The code below shows a nice simple example
of a couple of useful control search functions which then appear inside any object
inheriting from System.Web.UI.Control.
To define extension methods on an exising class in C# you would do the following:
Firstly create a public static class with whatever name you like in whatever namespace
you like (I've defined mine in DanielBradley.WebControlExtensions.ControlExtensions).
Secondly, define a public static (shared) function where you specify the first parameter
with the keyword "this" and it's type as the type of the class you
want to extend.
Finally, define everything else exactly how you would normally, with the correct
return type, your other parameters and you simply look at your first parameter as
the current object the functions is running in.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Web.UI;
namespace DanielBradley.WebControlExtensions
{
public static
class ControlExtensions
{
public static
Control FirstOrDefault<TSource>(this
Control ctrl, Func<TSource,
bool> predicate) where
TSource : Control
{
Type targetType =
typeof(TSource);
foreach (Control
c in ctrl.Controls)
{
if (c.GetType() == targetType && predicate((TSource)c))
{
return c;
}
Control recMatch = c.FirstOrDefault<TSource>(predicate);
if (recMatch != null)
{
return recMatch;
}
}
return null;
}
public static
IEnumerable<TSource> FindRecursive<TSource>(this Control
ctrl, Func<TSource,
bool> predicate) where TSource :
Control
{
if (ctrl == null
|| ctrl.Controls.Count == 0)
return new
List<TSource>();
return ctrl.Controls.OfType<TSource>().Where(predicate).Union(
ctrl.Controls.Cast<Control>().SelectMany(c
=> c.FindRecursive<TSource>(predicate)));
}
public static
IEnumerable<TSource> FindRecursive<TSource>(this Control
ctrl, Func<TSource,
bool> predicate, int depthLimit)
where TSource : Control
{
if (ctrl == null
|| ctrl.Controls.Count == 0)
return new
List<TSource>();
if (depthLimit == 0)
{
return ctrl.Controls.OfType<TSource>().Where(predicate);
}
else
{
return ctrl.Controls.OfType<TSource>().Where(predicate).Union(
ctrl.Controls.Cast<Control>().SelectMany(c
=> c.FindRecursive<TSource>(predicate, depthLimit - 1)));
}
}
}
}
Extension methods in C# are implemented as a specific language feature, however,
you can also implement
extension methods in VB.NET through use of attributes:
Firstly import System.Runtime.CompilerServices.
Secondly define your (public) function, add the Extension attribute and define a
first parameter as the type you want to add the function to (into which the object
your function is running on is passed in to).
Here's the equivilant VB code (actually re-written not auto-converted :)
Imports System.Runtime.CompilerServices
Imports System.Web.UI
Public Module
ControlExtensions
<Extension()> _
Public Function
FirstOrDefault(Of TSource
As Control)(ByVal ctrl
As Control, ByVal predicate
As Func(Of TSource,
Boolean))
Dim targetType = GetType(TSource)
For Each c
As Control In
ctrl.Controls
If c.GetType.Equals(targetType)
AndAlso predicate(c) Then
Return c
End If
Dim recMatch = c.FirstOrDefault(predicate)
If
recMatch IsNot Nothing
Then
Return recMatch
End If
Next
Return Nothing
End Function
<Extension()> _
Public Function
FindRecursive(Of TSource
As Control)(ByVal ctrl
As Control, ByVal predicate
As Func(Of TSource,
Boolean))
If ctrl Is
Nothing OrElse
ctrl.Controls.Count = 0 Then
Return New List(Of
TSource)
Return ctrl.Controls.OfType(Of
TSource).Where(predicate).Union( _
ctrl.Controls.Cast(Of Control).SelectMany(Of TSource)(Function(c)
c.FindRecursive(predicate)))
End Function
<Extension()> _
Public Function
FindRecursive(Of TSource
As Control)(ByVal ctrl
As Control, ByVal predicate
As Func(Of TSource,
Boolean), ByVal depthLimit
As Integer)
If ctrl Is
Nothing OrElse
ctrl.Controls.Count = 0 Then
Return New List(Of
TSource)
If depthLimit = 0
Then
Return ctrl.Controls.OfType(Of
TSource).Where(predicate)
Else
Return ctrl.Controls.OfType(Of
TSource).Where(predicate).Union( _
ctrl.Controls.Cast(Of Control).SelectMany(Of TSource)(Function(c)
c.FindRecursive(predicate, depthLimit - 1)))
End If
End Function
End Module
Having written these examples there's loads of ideas of useful stuff that's
springing to mind that you could extend from here:
- Make the predicate optional.
- Write some generic tree search implementations on the IEnumerable interface.
- Any other recursive based algorithms that could be useful on sets?
Anyway, there's the post, hope this is useful - might perhaps pad this out a
little then move it into a project on github or something if it's potentially
useful to people.
Daniel
Monday, September 27, 2010
Use Case
Just started working with amazon's S3 buckets to hold a centralised filesystem to
support a distributed workflow system. When the tasks in the workflow run on different
physical machines in a viariety of locations so it's we need efficient ways of syncronising
just small sub-sections of local files with a bucket.
The Plan
Amazons API allows listing objects by a key prefix i.e. search for all the
files in a particular folder or sub-folders. This is a great way of syncronising
folders where they might contain sub-folders, however we need to also list the same
files from the local file system.
The second task is then comparing files, I our system the synronisation is only
performed in one direction at a time (pull or push) and therefore we can calculated
which files have been:
- created (if it dosen't exist on the destination)
- deleted (if it dosen't exist on the source)
- modified (if the md5 of the local file doesn't match the etag on amazon)
Implementation
Get the current amazon file list
I'm using amazon's own .NET API for this example. The first task is to request all
the objects within a particular folder. First we create the S3 client:
AmazonS3Client
client = new
AmazonS3Client("awsAccessKeyId",
"awsSecretAccessKey");
Then we get all the files (S3 objects) under the desired folder using a ListObjectsRequest
and getting the keys and their corresponding etags out into a dictionary for later:
ListObjectsResponse
folderObjects = client.ListObjects(new
ListObjectsRequest() { BucketName = "dbradley-test-bucket",
Prefix = "test/folder" });
Dictionary<string, string> remoteObjects
= folderObjects.S3Objects.ToDictionary(obj => obj.Key, obj => obj.ETag);
Get the current local file list
To get the local files in a similar format takes a little more work as filesystems
don't naturally let you recursively get the files and paths for all sub folders.
The approach to implement this behaviour is therefore going to be to implement a
recursive function to dig down into all the sub directories.
The output of this funciton needs to be something that's comparible with the previous
result from the amazon bucket - a dictionary mapping the file path to its MD5 hash.
The first step is to be able to generate an "amazon compatible" checksum
of a file. We can use the ComputeHash function of the MD5CryptoServiceProvider
class. This can be simply passed an stream and will return the hash as a byte array.
However, to make this bit array into a hex encoded string we use the BitConverter
ToString method, then simply strip the dashes and lower the case so that it will
match the etag returned by amazon.
Note: There's probably a more efficient method of doing the conversion from byte
array to hex, but this will do for now!
Therefore the hashing function looks something like:
string
hash = BitConverter.ToString(crypto.ComputeHash(fileStream)).Replace("-", string.Empty).ToLower();
The next consideration is the time it takes to calculate these hashes. Even the
most efficient of MD5 implementation introduce a significant cost to calculate,
especially with big files. Therefore, rather than returning a dictionary of file
paths mapping to the actual string MD5 hash we will actually return the paths mapping
to a function which, only when run, will return the MD5 hash of the given file.
We can define this using a delegate function which doesn't take an input:
delegate
{
using (var
stream = file.OpenRead())
{
return
BitConverter.ToString(crypto.ComputeHash(stream)).Replace("-", string.Empty).ToLower();
}
}
Going back to the recursive function, we need to make sure that the file keys match
with those on amazon. Amazon paths looks somthing like "test/folder/file.txt"
and therefore we need to make all of our local paths relative to a specific folder.
Therefore we will define two root functions for simplicity:
- Get all the files within a directory (and assume that the given directory is the
root directory in amazon).
- Get all the files within a directory and specify the current directories path
on amazon.
Each of these funcitons will then call the internal recursive method. This internal
method then simply returns the keys and hash functions of each file in it's
current directory combinded with the keys and hash functions of each of it's
sub-directories.
Bringing it all together.
So, finally here's the code to get a local directory as a set of amazon compatible
paths mapping to an Amazon-compatible md5 hash.
public
static Dictionary<string,
Func<string>>
GetLocalFileKeys(DirectoryInfo directory)
{ return GetLocalFileKeys(directory,
string.Empty, new
MD5CryptoServiceProvider()).ToDictionary(kvp
=> kvp.Key, kvp => kvp.Value);
}
public
static Dictionary<string, Func<string>> GetLocalFileKeys(DirectoryInfo
directory, string rootPath)
{
return GetLocalFileKeys(directory,
rootPath, new MD5CryptoServiceProvider()).ToDictionary(kvp
=> kvp.Key, kvp => kvp.Value);
}
private
static IEnumerable<KeyValuePair<string, Func<string>>> GetLocalFileKeys(DirectoryInfo
directory, string currentPath,
MD5CryptoServiceProvider crypto)
{
if (directory == null)
throw
new ArgumentNullException("directory",
"directory is null.");
return directory.EnumerateFiles().Select
(
file =>
new KeyValuePair<string,
Func<string>>
(
currentPath + "/" + file.Name,
delegate
{
using (var
stream = file.OpenRead())
{
return BitConverter.ToString(crypto.ComputeHash(stream)).Replace("-", string.Empty).ToLower();
}
}
)
)
.Union
(
directory.EnumerateDirectories().SelectMany
(
childDir => GetLocalFileKeys(childDir,
currentPath + childDir.Name + "/",
crypto)
)
);
}
One observation of the internal function is that it is using IEnumerable of KeyValuePair
rather than an actual dictionary. This is due to dictionaries not being able to
add collections of new pairs at once (as we need to do this when calling the function
recursively so that the results are presented in a flat collection).
Saturday, July 10, 2010
In my first post I started by discussing the motivations for re-designing a large information system from scratch. In this post we’re going to get a little more into the practical steps you can take to ensure you’re actually implementing a system that will actually meet the requirements of the business.
The plus-side of re-implementing an existing system is that all the current requirements are already defined by the existing code, the down-side being that it can be completely in-comprehensible and there may be features or tools buried deep down in the code that only 1 person uses!
To go beyond replicating the existing system you have to get involved in the day-to-day use of the system. I personally had a fair amount of experience with the last system I was working on, however, there was still a plethora of feature that I had no idea about until I talked to the people using the system.
Some of the operations that people do may not be something implemented using code, rather the existing system may well involve managers with large spreadsheets to track performance and assign tasks to their teams. Also this may involve people creating workarounds for the current system, such as printing or writing out information they use on a regular basis because they can’t get at the information they need at the necessary times. Although to a software engineer these practices seem slow and counter-intuitive, however, there’s probably a very good practical reason why these things have developed which will mostly likely point back to failings in the existing system and these are very important pointers to take note of.
Be agile
When trying to extract requirements the most important aim is best summed up by the last line of the agile manifesto:
Responding to change over following a plan
However thoroughly you research what you’re trying to build you’ll never fully capture everyone’s needs first time, so be open to changing major parts of your system even late on in the project if required.
Thursday, May 27, 2010
Through the next few weeks or months I'd like to run a small series of articles sharing my experiences from the largest of the project I've worked on and explore some of the real-world problems I've come across and how we went about solving them. I'm afraid I can't give too many specifics on the project right now as it's not yet complete so you'll have to forgive me for being a little abstract in places!
To start with I'm going to run through a little of the background of the problem and the motivations to re-design from scratch. Then I'll work through the approaches taken to understanding the requirements, designing, implementing, testing and migrating to the new system.
Motivations for Re-designing a Large Information System
The system is one that's been in place for a number of years and was originally designed to do a significantly different one to what it's now being used for. This is mainly due to the product maturing as well as client requirements changing.
As with most information systems this one can be defined in four main areas of functionality:
- Input – adding information to the system
- Storage – persisting information in an efficient, searchable structure
- Output – delivering the information to the client
- Control – management of the process
There can be a variety of reasons to re-design an existing system; a few of our own turned out to be factors such as:
- Overall system reliability
- System response time
- Failure isolation and recovery
- Maintainability of code and information
- General extensibility to solve future problem
- Separation of business and product concerns
- New or improved features
The factor that started the thought process was the desire to improve the way in which information was entered into the system. However, this alone was not the entire reason for deciding to redesign.
Business Drivers
Typically all software engineers would always prefer to do a project from scratch themselves. It generally means you don't have to deal with problems created by predecessors and you can create your own absolutely perfect solution. However, the reality of working within a business is that the bottom line comes down to return on investment. For a medium sized business such as mine there must be actual value able to be delivered within a reasonable timeframe for any work to be started. As a result, any long term project will generally take a lot of effort and consideration to be approved by those in charge and therefore it might be better to break down the project into more manageable chunks which allow more frequent deliverables and also value within a shorter timeframe.
As the only thing of concern was the methods for inputting information, this is where we started with requirements gathering and design. However knowing that there might be more to the problem and not limiting your design decisions before the requirements is key to finding the best solutions.
Wednesday, May 19, 2010
So, here's my new blog up and running, who am I and what am I planning to write here?
First off - here's a little about me:
I'm a recent graduate from university (coming up to a year ago since I finished) studying Software Engineering on a four year course where the third year was an industrial placement. During the industrial placement I went to work for a company called Adfero in a "Technical Consultant" role as well as a junior "Information Systems Developer". Once I completed my placement I went back to complete my final year but also continued in my developer role 2/3 days a week with the company.
Working part time while at uni always seems like a great idea until you get half way through the year. For me the problem was not so much having a lack of time, but rather a lack of interest in the course content having got a chance at working on real projects in a live environment. Most people who have been graduated a little while also find this - when looking back at uni work, it seem to be much more trivial from a problem solving point of view which I found to be true and I found key to uni work to actually be your ability to prove though how you talk about something that you comprehensively understand the basics.
After completing uni I then returned full time to Adfero purely in the developer role which is where I've now been for almost a year and have now also taken on the title of "Information Systems Architect" where I'm working on some of the more high level design problems within the products.
What I'm wanting to share on this blog is some of the interesting things I've learnt myself over the last year, the things they don't teach you in uni and pretty much anything else I find interesting! My personal favorite areas are text indexing, search and particularly good software engineering design - good design combined with good code makes the first step towards a well-written, maintainable piece of software.
Hopefully I'll also be able to share a few of the products I've worked on, the mistake I've made and the software problems I've inherited from previous developers and had to heavily re-factor.