The Architect´s Napkin

Software Architecture on the Back of a Napkin
posts - 69 , comments - 229 , trackbacks - 0

My Links

News

Archives

Post Categories

Image Galleries

Aspect Oriented Programming made easy with Event-Based Components – Part I

AOP still is pretty much a pain when living according to “traditional” object orientation. You need fancy tools or you need to do some advanced code slinging. With Event-Based Components, though, introducing aspects is a piece of cake. Actual code is freed from tackling special concerns. Rather concerns become a matter of architecture. But see for yourself. The scenario for today is file processing.

I want an application which indexes .TXT files.

The program should crawl a directory hierarchy, extract words from files, and write an index file of this format:

word
  filename
  filename
  filename
  …


word
  filename
  …

All words should get indexed except stop words (which could be recognized according to some rules or a predefined list).

The application can be a console app to which the path to the root folder is given, and the name of the index file to create. Usage could look like this:

indexer c:\documents c:\documents\index.inx

Initial design without AOP

Let me start with a simple design without any frills. Just a straightforward synchronous and sequential application. It consists of just a single feature, i.e. a cause-effect-relationship linking a trigger to a result. Here the trigger is starting the application, and the result is the index file. So a single feature process will do:

image

That´s a high level view of the program. Note how the data flows out from Compile files: it´s not only that the activity spits out the names of the .TXT files it finds in the directory tree, but also it passes on the name of the index file to create. I could have split this information from the input tuple before the activity, but I chose not to do so, to keep the level of abstraction homogeneous.

Ok, not for some more detail. I sure would not want to start implementation with just this design in my hands.

image

Extract words first reads all words from each file, then it filters out the stop words and passes on just relevant words to be indexed.

Build index receives all remaining words and compiles them into an index which then is passed on to be written to a file:

image

Note how Compile words produces two outputs: some index statistics as well as the index itself. And Write index to file does not produce an output at all, but just a side effect: the index file. But this is not done before the index is ready and the index filename has arrived. The Join activity ensures that processing only flows into Write index to file if both data items are present.

The above design seems to be fine – or is it? I thought so until I started implementing it. Then I realized that Read words actually mixes two concerns: domain logic and resource access. It gets passed in a filename, reads the file contents and (!) splits them into words. That´s not clean code according to the Separation of Concerns principle. So I refined the design on the fly by making Read words a composite activity aggregating two atomic activities:

image

Now, the beauty of EBCs lies in that this refinement is possible at any time. Nothing changes for the surrounding of an activity. The activity´s sole interface does not need to be changed.

After this change the Visual Studio solution for this design then looks like this:

image

Let me highlight a couple of interesting points from top to bottom:

  • I put the IndexStats class into a project folder called DataModel since, well, it constitues the data model of the application. Or at least a part of it. The other parts are just standard data types, so they don´t show up in the project. I like to separate the two main dimensions – the data model and the process model – by putting them at least in their own folders.
  • Then I lumped together all composite activities of the process model in one folder. They represent the concern of “wiring up activities”. If a designer tool was available they probably would be generated. Since they have a special structure and are so simple in their code, I usually isolate them from the real workhorses, the atomic activities.
  • See how I located the Read words activity beneath Extract words? I like to do that to mirror the design where Read words is nested within Extract words.
    Why didn´t I do so with all activities? Because atomic activities usually will not be defined in the same project with composite activities. Just out of convenience and due to the small size of this scenario I co-located them.
  • Which brings me to the other project folders which each is supposed to stand for a component. In later articles I´ll talk more about what I mean by components, but for know view them as aggregations of classes belonging closer together than others. As you can see, there are two components for resource access and two for domain logic. I deem index building as very different from writing an index to a file which in turn is very different from extracting words from a text etc.

Have a look at the complete code in the repository. The current version is located in the SyncTextfileIndexer branch. It´s not a fancy implementation , but it does a reasonable job. Word extraction, for example, is done by splitting a line at spaces; and stop words are all words with <= 3 characters. For what I want to write about in this posting, though, that´s not important. The application just needs to work at all. Data has to flow through the feature process correctly. Which brings me to a question so far neglected: the data flow protocol.

Data flow protocol vs IEnumerable<>

Think about it: Build index in the end writes the index to a file. But when is that supposed to happen? After the last file has been processed by Compile words. Correct. But how does Compile words know that it is called for the last time for the directory tree to process? The interface of Compile words looks like this:

public class Compile_words
{

    public void In_Process(Tuple<string, string[]> input)
    {
        …
    }


    public event Action<IndexStats> Out_Statistics;
    public event Action<Index> Out_IndexCompiled;
}

How can In_Process() determine, if the input tuple is the final one and the index build should be passed on to the next processing stage via Out_IndexCompiled?

This is only possible if the flow of data to Compile words follows a protocol. This need not be a sophisticated one, but there needs to be a protocol. In this case it looks like this

{ Tuple<string, string[]> } null

Compile words adds words to the index until it encounters a null as input. That´s the “end of data” signal, and Compile words emits the index. (Yeah, I know, using a null value is not cool, but I want to keep the solution simple and not introduce additional data flows or option values.)

But this moves the question only one processing stage back: How does the activity before Compile words know when to send the null value? Well, Filter words knows that from a null input it receives itself. The same is true for Split lines into words and Read text lines etc. up to Crawl directory tree.

For this protocol to work, the first activity in the feature process has to signal when it´s finished. So Crawl directory tree sends a null after the last text file found to signify “end of data”.

As you can see in the code, that´s working pretty well. Check out this exsample:

public class Filter_words
{
    public void In_Process(Tuple<string, string[]> input)
    {
        if (input == null)
            this.Out_FilteredWords(null);
        else
        {
            var filteredWords = input.Item2.Where(word => word.Length > 3);
            this.Out_FilteredWords(new Tuple<string, string[]>(input.Item1, filteredWords.ToArray()));
        }
    }


    public event Action<Tuple<string, string[]>> Out_FilteredWords;
}

Sure, I could have done it differently. I could have passed on batches. Crawl directory tree could compile all text filenames into a list and send them all together to the next processing stage. Read text lines then could read all lines from all files, and pass them on as a whole. And so on. This would even be pretty easy using IEnumerable<>. Check out how different that looks in code. I set up a branch for this approach in the repository. Here´s Filter words again:

public class Filter_words
{
    public void In_Process(IEnumerable<Tuple<string, string[]>> input)
    {
        var filteredWordsInFiles = input.Select(wordsInFile =>
        {
            var filteredWords = wordsInFile.Item2.Where(word => word.Length > 3);
            return new Tuple<string, string[]>(wordsInFile.Item1,
                                               filteredWords.ToArray());
                                                           
        });
        this.Out_FilteredWords(filteredWordsInFiles);
    }


    public event Action<IEnumerable<Tuple<string, string[]>>> Out_FilteredWords;
}

From the point of view of the design drawings there isn´t much of a difference. The activities are still connected in the same way. The data is flowing like before.

Codewise, though, you might find one way or the other easier to understand. I´m kind of undecided. Maybe I like the IEnumerable variant a little better – but I´m not sure how that will scale… Because, you know, this posting is about Aspect Oriented Programming. And one of the aspects I want to introduce is asynchronous processing.

But we´ll see… I guess I´ll move forward both approaches. That way we don´t need to speculate but get hard evidence.

 

Since the posting has become already quite long let me close for today. The base line for showing how easy AOP is with EBC is set. We have a working application. Next time I´ll show you how to introduce aspects like logging, exception handling, and validation. And after that we´ll move on into the realm of asynchronicity.

Print | posted on Wednesday, August 4, 2010 9:51 AM | Filed Under [ Event-Based Components ]

Feedback

Gravatar

# re: Aspect Oriented Programming made easy with Event-Based Components – Part I

Hi Ralf, great article series. I wanted to write a comment, but it grew up to a whole blog post:
http://blog.tanascius.com/2010/08/designing-with-ebc-event-based.html
There I am introducing a new component (muliplexer/demultiplexer) to handle the "Data flow protocol vs IEnumerable<>" issue.
8/11/2010 10:01 PM | Christoph Dörfler
Gravatar

# re: Aspect Oriented Programming made easy with Event-Based Components – Part I

I like your mux/demux activity. It resembles the scatter/gather I implemented in Part III - but it combines both activities which makes it a little easier to implement and insert into a design.

At first sight I felt uneasy with the data flowing out from and then back into the mux/demux - but I guess that´s ok.
8/11/2010 10:31 PM | Ralf Westphal
Gravatar

# re: Aspect Oriented Programming made easy with Event-Based Components – Part I

Do you use any specific tools to develop according to AOP?

I've heard something about PostSharp...
11/27/2010 1:12 PM | Manuel
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

Powered by: