The Incremental Architect´s Napkin - #7 - Nest flows to scale functional design

You can design the functionality of any Entry Point using just 1D and 2D data flows. Each processing step in such flows contains logic1 to accomplish a smaller or larger part of the overall process.

To benefit most from Flow Design, the size of each such step should be small, though.

Now think of this scenario: You have a program with some 100,000 lines of code (LOC). It can be triggered through 25 Entry Points. If each started a flow of maybe 5 processing steps that would mean, functional units would contain around 800 LOC on average. In reality some probably would be just 50 LOC or 100 LOC - which would require others to contain 1,500 LOC or even more.

Yes, I mean it: Think of the whole functionality of your software being expressed as flows and implemented in functional units conforming to the Principle of Mutual Oblivion (PoMO). There´s no limit to that - even if you can´t imagine that right now yet ;-)

What should be limited, however, is the length of the implementations of the functional units. 1,500 LOC, 800 LOC, even 400 LOC is too much to easily understand. Logic of more than maybe 50 LOC or a screenful of code is hard to comprehend. Sometimes even fewer LOC are difficult to grog.

Remember the #1 rule of coding: Keep your functions small. Period. (Ok, I made up this rule just now ;-) Still I find it very reasonable.)

The #1 rule of Flow Design then could be: Don´t limit the number of processing steps. Use as many as are required to keep the implementation in line with the #1 rule of coding.2

Flow processing steps turning into functions of some 50 LOC would be great. For 100,000 LOC in the above scenario that would mean 2000 functional units spread across 25 Entry Point flows, though. With each flow consisting of 80 processing steps. On average.

That sounds unwieldy, too, doesn´t it? Even if a flow is a visual representation of functionality it´s probably hard to understand beyond maybe 10 processing steps.

The solution to this dilemma - keep function size low and at the same time keep flow length short - lies in nesting. You should be able to define flows consisting of flows. And you are.

I call such flows three dimensional (3D), since they add another direction in which to extend them. 1D flows extend sequentially, "from left to right". 2D flows extend in parallel by branching into multiple paths. 3D flows extend "vertically".

image

In 3D flows a 1D/2D flow is contained in a higher level processing step. These steps integrate lower level functional units into a whole which they represent. In the previous figure the top level functional unit w integrates s and t. One could say s and t form the w process.

s in turn integrates a, d, and f on the bottom level. And t wires-up b, c, and e to form a flow.

a through f are non-integrating functional units at the bottom level of this flow hierarchy.

Showing such nesting relationships by actually nesting notational elements within each other does not scale.

image

This might be the most authentic depiction of nested flows, but it´s hard to draw for more than three levels and a couple of functional units per level.

A better choice is to draw nested flows as a "tree" of functional units:

image

In this figure you see all levels of the process as well as how each integration wires-up another nested flow. Take the triangles as a rough depiction of the pinch gesture on your smartphone which you use to zoom in on a map for example. It´s the same here: each level down the diagram becomes more detailed.

Most of the time, though, you don´t need to draw deeply nested 3D flows. Usually you start with a top level flow on a napkin or flip chart and then drill down one level. If deeper nesting is needed, you take a new napkin or flip chart and continue there.

Here´s an example from a recent workshop. Never mind the German labels on the processing steps:

image

It´s a functional design on three levels also including the class design. But that´s a topic for another time.

What I´d like you to note here is the sketchy character of the design. It´s done quickly without much ado about layout and orderliness. It´s a "living document", a work in progress during a design session of a team. It´s not a post-implementation depiction (documentation), but a pre-implementation sketch. As that it´s not supposed to have much meaning by itself outside the group of people who came up with the Flow Design.

But it can be taken to explain the design to another person. In that the diagram would be taken as a map to point to and follow along with a finger while explaining what´s happening in each processing step on each level.

And of course it´s a memory aid. Not only talking about a (functional) design but actually keeping visually track of it helps to remember the overall software structure. A picture is worth a thousand words.

Back to LOC counting: With nested flows 80 functional units per Entry Point should not sound unwieldy anymore. Let´s put 5 functional units into a sub-flow for integration by its own functional unit on a higher level. That would lead to 16 such integrating processing steps. They would need another 3 functional units for integration on yet another higher level. So what we end up with is 1 + 3 + 16 + 80 = 100 functional units in total for some 4,000 LOC of logic code. That does not sound bad, I´d say. Admittedly it´s an overhead of 25% on functions - but it´s only maybe around 5% more LOC within functions. As you´ll see, integration code is simple. A small price to pay for the benefit of small functions throughout the code base.

Integration vs operation

You might think, nested flows are nothing more than functional decompositions of the past. Functions calling functions calling functions... But it´s not.

Yes, it´s "functions all the way down". Those functions are not created equal, though. They fundamentally differ in what their responsibilities are:

  • Integrating functional units just do that: they integrate. They do not contain any logic.
  • Non-integrating functional units just contain logic. They never integrate any other functional units. That´s Operations.

I call this the Integration Operation Segregation Principle (IOSP). It´s the Single Level of Abstraction (SLA) principle taken to the extreme. Here´s a flow hierarchy reduced to its dependencies:

image

There is any number of integration levels, but only one level of Operations. Operations are the leafs of the dependency tree. Only they contain logic. All nodes above them do not contain logic.

That´s what makes decomposition in Flow Design so different from earlier functional decomposition. That plus Flow Design being about data flow instead about control flow.

Or let me say it more bluntly: I strongly believe that "dirty code" is a result of not containing logic in a systematic manner like this. Instead in your code base logic is smeared all over the de facto existing functional hierarchies across all sorts of classes.

This subtly but fundamentally violates the SRP. It entangles the responsibility of whatever the logic is supposed to do (behavior) with the responsibility to integrate functional units into a whole (structure). "Pieces of" logic should not be functionally dependent on other "pieces of" logic. That´s what the PoMO is about. That´s what Object Orientation originally was about: messaging.

To fullfil functional or quality requirements, logic itself does not need any separation into functions. That means as soon as functions are introduced into code, functional dependencies can be built which entail a new responsibility: Integration.

The beauty of Operations

In the beginning there was only logic. There were expressions, control statements, and some form of hardware access. And all this logic produced some required behavior.

Then the logic grew. It grew so large that it became hard to understand on a single level of abstraction.

Also in the growing logic patterns started to appear. So the question arose, why pattern code should be repeated multiple times?

Thus were invented subroutines (functions, procedures). They helped to make programming more productive. Patterns stashed into subroutines could be re-used quickly all over the code base. And they helped to make code easier to understand, because by calling a subroutine details could be folded away.

Before:

var x = a + ...;
var y = x * ...;
var z = y / ...;

After:

var x = a + ...;
var y = f(x);
var z = y / ...;

The change looks innocent. However it´s profound. It´s the birth of functional dependencies.

The logic transforming a etc. into z is not fully in place anymore but dependent on some function f(). There is more than one reason to change it:

  1. When the calculation of x or z changes.
  2. Or when something in the subroutine changes in a way that affects dependent logic, e.g. the subroutine suddenly does not check for certain special cases anymore.

Even though the logic and the subroutine belong closely together they are not the same. They are two functional units each with a single responsibility. Except that it´s not true for the dependent functional unit which has two responsibilities now:

  1. Create some behavior through logic (Operation)
  2. Orchestrate calls to other functions (integration)

To avoid this conflation the IOSP suggest to bundle up logic in functions which do not call each other.

Subroutines are a great tool to make code easier to understand and quicker to produce. But let´s use them in a way so they don´t lead to a violation of the fundamental SRP.

Bundle logic up in functions which do not depend on each other. No self-made function should call any other self-made function.

  • That makes Operation functions easy to test. There are no functional dependencies that need to be mocked.
  • That will naturally lead to small and thus easy to understand functions. The reason: How many lines of logic can you write before you feel the urge to stash something away in a subroutine? My guess it´s after some 100 or 200 LOC max. But what if no functional dependencies are allowed? You´ll finish the subroutine and create another one.

That´s the beauty of Operations: they are naturally short and easy to test. And it´s easy to check, if a given function is an Operation.

The beauty of Integrations

Once you start mixing logic and functional dependencies code becomes hard to understand. It consists of different levels of abstraction. It might start with a couple of lines of logic, then something happens in another function, then logic again, then the next functional dependency - and on top this all is spread across several levels of nested control statements.

Let´s be honest: It´s madness. Madness we´re very, very used to, though. Which does not makes it less mad.

We´re burdening ourselves with cognitive dissonance. We´re bending our minds to follow such arbitrary distribution of logic. Why is some of it readily visible, why is some of it hidden? We´re building mental stacks following the train of control. We´re reversing our habitual reading direction: instead from top to bottom and from left to right, we pride ourselves to have learned to read from right to left or from inner levels of nesting to outer and from bottom to top. What a feat!

But this feat, I´d say, we should always subtitle with "Don´t try this at home!" It´s a feat to be performed on stage, but not in the hurry of every day work.

So let´s stop it!

Let´s try to write code just consisting of function calls. And I mean not just function calls, but also function calls in sequence, not nested function calls.

Don´t write

a(b(c(x)));

instead write

var y = c(x);
var z = b(y);
a(z);

Let´s try to tell a readily comprehensible story with out code. Here´s the story of converting CSV data into a table:

Developer A: First the data needs to be analyzed. Then the data gets formatted.

Developer B: What do you mean be "analyzing the data"?

Developer A: That´s simple. "Analysis" consists of parsing the CSV text and then finding out, what´s the maximum length of the values in each column.

Developer B: I see. Before you can rearrange the data, you need to break the whole chunk of CSV text up. But then... how exactly does the rearrangement work, the formatting?

Developer A: That´s straightforward. The records are formatted into an ASCII table - including the header. Also a separator line is build. And finally the separator is inserted into the ASCII table.

That´s the overall transformation process explained. There´s no logic detail in it, just sequences of what´s happening. It´s a map, not the terrain.

And like any story it can be told on different levels of abstraction.

High(est) level of abstraction:

Developer A: CSV data is transformed into an ASCII table.

Medium level of abstraction:

Developer A: First the data is analyzed, then it´s formatted.

Low level of abstraction:

Developer A: First the data is parsed, then the maximum length of values in each columns is determined, then the records are formatted into an ASCII table - including the header. At the same time a separator line is build. And finally the separator is inserted into the ASCII table.

Finally the bottom level of abstraction or no abstraction at all would be to list each step of logic. That wouldn´t be an abstract process anymore, but raw algorithm.

At the bottom it´s maximum detail, but it´s also the hardest to understand. So we should avoid as much as possible to dwell down there.

Without logic details we´re talking about Integration. Its beauty is the abstraction. Look at the code for the above story about CSV data transformation:

image

Each function is focused on Integration. Each function consists of an easy to understand sequence of function calls. Each function is small.

Compare to this to a pure Operation:

image

Now, which solution would you like to maintain?

Yes, Integration functions depend on others. But it´s not a functional dependency. Integration functions don´t contain logic, they don´t add "processing power" to the solution which could be functionally dependent. Their purpose is orthogonal to what logic does.

Integration functions are very naturally short since their building blocks (function calls) are small and it´s so cheap to create more, if it becomes hard to understand them.

Testing

Testing Operations is easy. They are not functionally dependent by definition. So there is no mocking needed. Just pass in some input and check the output.

Sometimes you have to setup state or make a resource available, but the scope you´re testing is still small. That´s because Operations cannot grow large. Once you start following the PoMO and IOSP you´ll see how the demand for a mock framework will diminish.

Testing Integrations is hard. They consist of all those function calls. A testing nightmare, right?

But in reality it´s not. Because you hardly ever test Integration functions. They are so simple, you check them by review, not automated test.

As long as all Operations are tested - which is easy - and the sequence of calls of Operations is correct in an Integration - which can be visually checked -, the Integration must be correct too.

But still... even if all Operations are correct and the Integration functions represent your Flow Design correctly the behavior of the whole can be unexpected. That´s because flows are just hypothesizes. You think a certain flow hierarchy with correct logic at the bottom will solve a problem. But you can be wrong.

So it´s of course necessary to test at least one Integration: the root of a 3D flow.

Interestingly that´s what TDD is about. TDD always starts with a root function and drives out logic details by adding tests. But TDD leaves it too your refactoring skills to produce a clean code structure.

Flow Design starts the other way round. It begins with a functional design of a solution - which is then translated into clean code. IOSP and PoMO guarantee that.

And you can test the resulting code at any level you like. Automated tests for the root Integration are a must. But during implementation of the Operation functions I also write tests for them, even if it´s private functions - which I throw away at the end. I call those "scaffolding tests". For more on this approach see my book "Informed TDD".

Stratified Design

You´re familiar with layered design: presentation layer, business logic layer, data access layer etc. Such layered design, though, is different from 3D flows.

In a layered design there is no concept of abstraction. A presentation layer is not on a higher or lower level of abstraction compared to the business logic layer or the data access layer. Only the combination of all layers forms a whole.

That´s different for abstractions. On each level of abstraction the building blocks form the whole. A layered design thus describes a solution on just one level of abstraction.

Contrast this with the 3D Flow Design for the CSV data transformation. The whole solution is described on the highes level of abstraction by Format(). One functional unit to solve it all.

On the next lower level of abstraction the whole solution is described by Analyze() + Format_as_ASCII_table().

On the next lower level of abstraction the whole solution is described by Parse() + Determine_col_widths() + Format_records() + Format_separator() + Build_table().

Below that it´s the level of logic. No abstraction anymore, only raw detail.

How do you call those levels of abstraction? They are not layers. But just "level" would be too general.

To me they look like what Abelson/Sussman called "stratum" when they talked about "stratified design".

Each stratum solves the whole problem - but in increasing detail the deeper you dig into the flow hierarchy. Each stratum consists of a Domain Specific Language (DSL) on a certain level of abstraction - and always above the logic statements of a particular programming language.

Fortunately these DSLs don´t need to be build using special tools. Their syntax is so simple just about any programing language (with functions as first class data structures) will do. The meta syntax/semantics for all such DSLs is defined by IOSP and PoMO. They are always data flow languages with just domain specific processing steps.

Here´s another scenario:

An application displays CSV data files as ASCII tables in a page-wise manner. When it´s started it asks for a file name and then shows the first page.

Here´s a 3D Flow Design for this (see the accompanying Git repository for an implementation). See how the solution to the former problem now is part of the larger solution?

image

Vertically it´s strata put on top of each other. The deeper you go the more detail is revealed.

At the same time, though, there are the elements of a layered design. They stretch horizontally.

Colors denote responsibilities:

  • Integrations are white,
  • presentation layer Operations are green (Ask for filename, Display table),
  • data access layer Operations are orange (Load data),
  • business logic Operations are light blue (all else).

In stratified Flow Design, though, functional units of different layers are not depending on each other. Thus layering loses its meaningfulness. It´s an obsolete concept. What remains, of course, is the application of the SRP. User interaction is different from file access or table formatting. Hence there need to be distinct function units for these aspects/responsibilities.

In closing

The quest for readable code and small functions can come to an end. Both can be achieved by following two simple principles: the Principle of Mutual Oblivion (PoMO) and the Integration Operation Segregation Principle (IOSP).

That´s true for greenfield code where you might start with a Flow Design. But it´s also true for brownfield code. Without a design look at a function and see if it´s an Operation or an Integration. Mostly you´ll find it´s a hybrid. That means you should refactor it according to PoMO and IOSP. Clean up by making it an Integration and pushing down any logic into lower level functions. Then repeat the process for all functions integrated.

I suggest you try this with a code kata. Do the bowling game kata or roman numerals or whatever. Use TDD first if you like. But in the end apply PoMO and IOSP rigorously.

In the beginning you´ll be tempted to keep just a few control statements in Integration functions. Don´t! Push them down into Operations. Yes, this will mean you´ll get function units with several outputs. But that´s ok. You know how to translate them into code using continuations or events.

Even if the resulting integration might look a bit awkward do it. You´ll get used to it. Like you got used to reversing your reading direction for nested function calls. But this time you´re getting used to a clean way of writing code ;-) That´s like getting sober. Finally.

Organizing code according to PoMO and IOSP is the only way to scale readability and understandability. We need abstractions, but we need them to be of a certain form. They need to be clean. That´s what IOSP does by introducing two fundamental domain independent responsibilities: Integration and Operation.

The beauty of this is, you can check for conformance to the SRP without even understanding the domain. Integration and Operation are structural responsibilities - like containing data ist. You can review the code of any of your colleagues to help them clean it up.


  1. Remember my definition of logic: it´s expressions, control statements and API-calls (which often stands for hardware access of some kind).

  2. I know, you´ve tried hard for years to keep the number of lines in your functions low. Nevertheless there are these monster functions of 5,000 LOC in your code base (and you´ve heard about 100,000 LOC classes in other projects). Despite all good intentions it just happens. At least that´s the code reality in many projects I´ve seen. But fear not! You´re about to learn how to keep all your functions small. Guaranteed. I promise. Just follow two principles. One you already know: the Principle of Mutual Oblivion.

Why I love Leanpub for getting my books to readers

There is some discussion going on, if/when using Leanpub is the right choice for a budding (or even established) author. Some contributions you may want to read include:

Much has been already said. So why add another article to the discussion? Because I feel there´s something missing. Some kind of systematic view of self-publishing.

Without some more structure, my guess is, authors still looking for their way to go, might get even more confused than they were before. Or is it just me who finds the self-publishing landscape quite confusing sometimes.

So here´s my take on the topic. Let me break down the self-publishing process into a couple of steps:

Write

Publishing starts with writing. It´s always the author who does the writing. But with self-publishing the author needs and wants to do more than that.

Writing fiction is pretty much just about plain text sprinkled with some chapter headings or occaissonal italics. Also writing most non-fiction books probably does not need more than that. Maybe an image here and there, maybe some text in a box, maybe a table. Still all those artifacts just flow from top to bottom on a page.

Sure, there are some topics or some didactical requirements which crave for more. But my guess is, most authors are unlike Jurgen Appelo. Most don´t want to get that deep into book layouting.

And even if more is needed, then the question is, when is it needed? As Peter Armstrong points out, Leanpub is about “book start-up”. It´s about exploration of a topic meeting a market. How much artful design is needed for that?

But writing is not just about producing text with some layout. Nowadays it´s also about file formats. We´re talking about eBooks, right? So how do you get from a text in some text editor program to PDF, mobi, epub - which seem to be the major eBook file formats?

How to do the export from e.g. Microsoft Word? How to ensure images to be of the right size/resolution? How to get the PDF also print ready?

Sure, this is all possible with a number of tools. If you´re striving for perfection and if you´re looking for maximum freedom and control alredy in this phase of the publishing process… well, then take your time and hunt down you personal “best of breed” mix of tools.

But if you´re like me and just want “good enough” layout plus quick setup of the whole thing… then you´ll love Leanpub.

I want writing a book and getting it to potential readers as easy as printing a letter. That´s what Leanpub is delivering for me. From the idea “oh, let´s make this a book” to a print ready PDF it´s a matter of minutes.

  1. Goto Leanpub.com in your browser (5sec)
  2. Sign in and create a new book project (1min including some meta-data)
  3. Accept the dropbox inviation from Leanpub (1min)
  4. Put your manuskript in the your Leanpub project´s folder in your dropbox (1min)
  5. Publish the project as an eBook (PDF, mobi, epub) (5sec)
  6. Export the project as a print ready PDF (5sec)

Of course this does not include the writing part ;-) And it does not include a comprehensive book description or a snazzy cover image. But that´s stuff you need to do anyway. It´s not specific to Leanpub.

What I want to make clear is the little overhead Leanpub requires. From manuscript to published eBook files as well as print ready PDF it´s just a couple of mouse clicks. You don´t need to select any tools, you don´t need to wire-up your tool chain. It´s as easy as putting files in a dropbox folder and hitting a button. Call it One-Click Publishing if you like.

I´m not saying Leanpub is unique in this. For example Liberio](http://www.liber.io) seems to offer a similar service. But currently I´m familiar with Leanpub and like it very much. It has allowed me to start book projects whenever I felt like it. Some I have finished, others are still in progress.

Also I helped other authors publish their books using Leanpub. They gave me their MS Word manuscripts and I converted them to Markdown. Each book took me less than a day from start to publication.

Which brings me to the only hurdle set up by Leanpub: Markdown. Markdown is not as powerful as MS Word. And even with Markdown editors like Mou or MarkdownPad it´s not the same as writing a manuscript in MS Word.

Switching from the jack-of-all-trades-on-every-desktop MS Word to a Markdown editor takes some getting used to. I cannot deny that. And what you can do in terms of layout is somewhat limited. But as argued above: What do you really, really need for your book anyway? Don´t overestimate that. Don´t try to be perfect, especially not during your first couple of iterations.

So I´d say: Markdown might still not be that widely known. But it´s really easy to learn. Markdown editors are here to help. It´s a good enough choice for many, many books.

As for the scenarios Jurgen Appelo depicted where Leanpub falls short (e.g. bundle only draft chapters into an eBook) I´d say: That´s not the problem of a publishing platform like Leanpub. It´s a matter of the editing tool. Neither Leanpub nor MS Word can do that. And that´s ok.

Collaborate

Self-publishing is supposed to be about collaboration. Collaboration between author and readers. No more “waterfall publishing” but “agile publishing”. Writing and publishing can and should go through a number of iterations.

This promises to get content out to readers earlier. And it allows for learning by the author through feedback from early readers.

Leanpub definitely supports this kind of agile or even lean approach. Nomen es omen. Setting up a book is trivial. Publishing the next iteration is trivial. Iterate as quickly as you like. Publish a new version of your book twice a day or twice a month or just once. With each iteration adapt to the market reactions. If you like. And if there are any ;-)

Leanpub offers a way for readers to give feedback. However that´s one of the features still lacking in quality, I´d say. Feedback cannot be given alongside the manuscript. Compared to the feedback system employed for Mercurial: The Definitive Guide it´s simplistic and not really state of the art.

But then… How much collaboration do you really, really need, want, expect? My experience: The willingness of the audience to provide detailed feedback is very limited. People want to read, not to co-author.

So at least to me the collaboration features are not that important. At least not with regard to public collaboration. Private collaboration among co-authors or a few hand picked alpha/beta readers is different. But how much support do I need from Leanpub for that? For my taste, it´s close to none. If I want to I can move manucript development to GitHub and get all their features plus Leanpubs ease of publishing.[1]

Distribute

Once you´ve written your book and honed it based on the feedback you got, you sure want to distribute it. Widely that is.

To criticize Leanpub for not being the most widely known eBook platform is missing the point, I´d say. Although Leanpub offers easy distribution through book project landing pages, it´s not their primary purpose. (Which ultimately might limit their revenue, though.)

To compare amazon or smashwords to Leanpub is a bit like comparing apples and pears.

I use Leanpub as a platform for distribution I do myself. When I send a link to one of my books to someone I use a Leanpub link. When I point out my books in a blog post or a tweet or a newsletter I include a link to Leanpub. I do that because then readers buy from the platform which locks them in the least. Leanpub does not enforce any DRM on my books.

Also when readers buy from Leanpub I get to see their email adresses (at least if they choose to share them upon purchase). And I can reach them directly and immediately whenever I update my book.

For greater reach I use amazon (or other online stores in Germany like Thalia). And again Leanpub makes it easy for me to publish. The mobi and epub files generated by Leanpub can right away be uploaded to Kindle Direct Publishing (KDP) or XinXii.

Once you published a version of your book on Leanpub it´s a nobrainer to publish it with amazon. It maybe takes another hour. What else do you want in terms of reach?

Maybe a traditional publisher gives you more. But then you´ve already decided to go down the self-publishing road, haven´t you? You want freedom and control. That´s what you get with Leanpub. Out of the box. Without a lengthy search for tools. Plus reach - via established online channels. Check out services like KDP, bookrix, XinXii to push your book to the masses.

But don´t be disappointed if you don´t land a bestseller right away. Your book still is one of millions out there. You need marketing of some kind or the other. But that´s a different issue altogether. Neither Leanpub nor amazon nor smashwords nor Lulu will do anything special for your book.

Print

Although self-publishing is easiest and quickest for eBooks you might want to turn your manuscript into a printed book for one reason or another. It makes for a more tangible gift, it might suit some older fashioned reader more, or whatever.

With Leanpub that´s easy. Just export your manuscript as a print ready PDF and off you go. Upload it to Createspace for example. Or Lulu. Or epubli. I´m using Createspace because that way it´s easiest for me to get the eBook and the print book next to each other in the amazon catalog.

Plus Createspace so far has been cheaper for me to get my own print copies from as an author. The quality is ok. The price including shipping from the US is ok. And for your readers who order via amazon it´s fastest. They´ll stock copies for you. Next day delivery should be no problem.

Monetarize

Finally, just in case you still want to earn money with your book, Leanpub makes that easy too. Much easier than amazon. From book idea to “online shop” it´s again a matter of minutes.

90% royalties are nice. Giving your readers the opportunity to pay what they like (in a price range you define) or letting them pay even more than you want is nice, too.

However I don´t find that important. 70% on amazon are ok for me too. I´m not writing books because I expect to get rich by writing. Getting some money out of it is nice. Not more. That´s why my books are priced very low. All under 10$ so far.

I use my books as a marketing tool or as text books for my trainings. It´s almost like blogging. I earn my money through other channels. And I think that´s the future for most authors. It´s like with music. The golden days for bands seem to be over. They don´t accumulate riches by selling records, but by selling tickets or whatever. eBooks like music files are easy to copy. DRM on them (e.g. Kindle books) is not here to stay. That´s my guess. So better face it right now before hitting a wall in a couple of years with a royalties based business model.

Summary

Self-publishing has become very easy compared to 10 years ago. Still, though, you´ve to find your way through the maze from manuscript to worldwide readers.

I prefer the easy road. I want my texts to hit eyeballs. For that turning a manuscript file into an eBook file has to be as simple as can be. That´s the case with Leanpub. No frills. But also no hassle. That´s what Leanpub delivers.

High frequency iterations are good for moving a project forward. No big manuscript up-front. Write a little, publish a little. That´s the modern way for the author. Readers can jump on whenever they like. But you as the author produce tangible results in the open. What a motivation to continue! That´s what Leanpub delivers.

For distribution I rely on the biggest online bookstore there is: amazon. That´s what Leanpub helps me to do.

And finally the money. That´s not really that important to me. But thanks Leanpub for some 90% royalties. And also thanks amazon for 70%. More than a decade ago when I wrote books for traditional publishers I got 12%. What a difference!

Today I´m much faster. I´m more flexible. I earn more. I can change the way things work any day. For now I´m very content with Leanpub. We´ll see how future publishing platforms look like. Choose your own. But don´t turn that into a science of its own. Starting is more important than the optimal tool chain. Stay nimble.

So much for my take at a somewhat systematic approach to answer the question “to Leanpub or not to Leanpub?” View publishing as a process consisting of phases or stages. Optimize for the whole, optimize for what´s most important for you. Maybe that´s layout. Maybe that´s speed, ease of use, royalties, reach, collaboration.


  1. I´d like to see Leanpub support Bitbucket private repositories. Bitbucket provides them for free which might be attractive for authors not having published a bestseller yet ;-)

The Incremental Architect´s Napkin - #6 - Branch flows for alternative processing

Confronted with an Entry Point into your software don´t start coding right away. Instead think about how the functionality should be structured. Into which processing steps can you partition the scope? What should be doen first, what comes next, what then, what finally. Devise a flow of data.

Think of it as an assembly line. Some raw input plus possibly some additional material is transformed into shiny output data (or some side effect fireworks).

Here is the Flow Design of the de-duplication example again:

image

That´s a simple sequential flow. Control flows along with the data. And it´s a one dimensional (1D) flow. There is just one path from start to end through the graph of processing nodes.

Such flows are common. For many functions they are sufficient to describe the steps to accomplish what´s required. And as you saw in the previous chapter they are easy to translate into code:

static void Main(string[] args)
{
    var input = Accept_string_list(args);
    var output = Deduplicate(input);
    Present_deduplicated_string_list(output);
}

Streams causing alternative flows

So much for the happy day. But what if an error occures? Input could be missing or be malformed. Sure you would not want the program to just crash with a cryptic error message.

If "graceful failure" becomes a requirement, how could it added to the current design? I suggest a preliminary processing step for validation:

image

It´s still a sequential 1D flow - but now the processing steps after validaton are optional so to speak. See the stream coming out of validation? The asterisk means, maybe (args) will flow out, maybe not. It depends on whether the command line arguments were validated correctly.

For simplicity´s sake let´s assume validation just checks if the program gets called with exactly one command line argument. If not, an error message should be printed to standard output.

This could now easily be implemented:

image

And the effect would be stunning when running the program with invalid command line parameters:

image

Look at the Entry Point closely. First notice: the flow is readily visible. Still. Even though three of the four steps now run as a continuation. Let yourself not be deceived by this. Just read at the program text from top to bottom.

Technically, though, it´s not that simple. The three last steps note just written after the first one. They are nested. They get injected. That´s what makes conditional execution possible without a (visible) control statement.

Now there are two alternative execution paths through the flow:

  1. Validate_command_line()
  2. Validate_command_line(), Accept_string_list(), Deduplicate(), Present_deduplicated_string_list()

The alternatives are signified by the stream flowing from the validation.

And of course there is a control statement deciding between alternatives. But it´s not part of the Flow Design. It´s in implementation detail of Validate_command_line(). The data flow remains free of logic even though there are alternative paths through it.

Take the indentation of the continuation as a hint for the alternative. This might look a bit strange at first, but you´ll get used to it. Or if you like find some other formatting for continuations. Just be sure to keep an eye on consistency and readability - within the limits of a textual flow representation.

Branches for explicit alternative flows

Validating the command line in this way works - but it´s not a clean solution. It´s not clean, because the SRP is violated. The validation has more than a single responsibility. It has at least two: it checks for validity (expression plus control statement) and it notifies the user (API-calls).

That´s not good. The responsibilities should be separated. One functional unit for the actual check, another for user notification.

This, though, cannot be accomplished with a 1D flow. There need to explicit branches: one for the happy day, another one for the rainy day.

image

You see, functional units can have more than one output. In fact any number of output ports is ok. As for the translation it should be obvious that in case of more than one output a translation into return is not possible. If more than one output ports is present, all should be translated into function pointers. I don´t recommend mixing return with function pointers.

This is how the translation looks like:

image

Both functions - Validate_command_line() and Present_error_message() - now have a single responsibility. And the flow in Main() is still pretty clear - at least once you have gotten used to "thinking in functions".

The two paths through the flow now are:

  1. Validate_command_line(), Present_error_message()
  2. Validate_command_line(), Accept_string_list(), Deduplicate(), Present_deduplicated_string_list()

If you have a hard time figuring this out from the code, give yourself some time. Realize how you have to re-program your brain. It´s so used to see nested calls that it´s now confused. There is nesting - but the nested code is not called first? Yes. That´s a result of some Functional Programming here. The flow translation uses functions (lambda expressions) as first class data citizens.

In case you have wondered so far, what all the fuzz about lambdas and closures was all about in C# (or Java)... Now you see what it´s useful for: to easily translate Flow Designs into code.

Yes, this looks a bit clumsy. But that´s due to C# (or Java or C++ or JavaScrip) being object oriented languages. And it´s due to a textual notation. Expressing alternatives in text is always a difficult think. In a visual notation alternatives often are put side by side. That´s not possible with current text based IDEs. So don´t blame the unusual code layout not only on Flow Design.

Finally: Let me assure you that it´s possible to get used to reading this kind of code fluently. Hundreds of developers I´ve trained over the passed years have accomplished ths feat. So can you.

Back to the problem:

Please note how the two continuations of Validate_command_line() do not hint at what´s going to happen next downstream. Their names refer to the purpose of the function, not its environment. That´s what makes the function adhere to the PoMO.

Both names make it obvious which output port of Validate_command_line() is used when. That´s not so obvious in design. When you look at the Validate command line "bubble" with its two outputs you can´t see which one belongs to which alternative.

For such a small flow that´s not really a problem. But think of more than two outputs or not mutually exclusive alternatives. So if you like annotate the Flow Design with port names. I do it like this:

image

The same you can do for input ports, if there should be more than one. Put a name next to the port prefixed with a dot. That way the name looks like a property of the functional unit.

Also notice how both outputs are streams. That´s to signify the optionality of data flowing. It´s a conceptual thing and not technically necessary.

You can translate streams to function pointers, but in C# at least you also could choose yield return with an iterator return type (IEnumerable). Or if output data is not streamed you can still translate the output port to a function pointer.

Still, though, I guess designs are easier to understand if you put in the asterisk. Don´t think of streams as a big deal. It´s just as if functions could have an optional return value. (Which would be different from returning an option value like in F#.)

Why is there not flowing an error message out from validation? That´s just a design choice. In this case I decided against it, since there is only one error case. In other situations an error text could flow from several different validation steps to a single error reporting functional unit. Or just an error case identifier (enum). Or even an exception; instead of throwing it right away the validation could leave the decision what to do to some other functional unit.

Flow Design as language creation

As you see, the lack of control statements in Flow Design does not mean single flow of data. Flows can have many branches - although from a certain point on this becomes unwieldy. Spaghetti flows are a real danger like spaghetti code.

That´s also the reason why I would like to caution you to introduce circles into your flow graphs. Keep them free of loops. Only very rarely there should be a need for letting flow data back to an upstream functional unit.

Likewise don´t try to simulate control flow. Branching being possible does not mean, you should name your processing steps "if" or "while". This would lower the level of abstraction of your design. It would defy its purpose.

Flow Design is about creating a Domain Specific Language (DSL) en passant. It´s supposed to be declarative. It´s supposed to be on a higher level of abstraction than your programming language. Take the Flow Design notation as a universal syntax to declaratively describe solutions in arbitrary domains.

How such flows are executed should be of no/little concern to you. It´s like writing a christmas gift list. Your daughter wants a pony, your son a real racing car? They don´t care how Santa Clause manages to fulfill their wishes.

Likewise at design time trust there will be a way to implement each processing step. Later. And the more fine grained they are the easier it will be. But until then assume they are already present and functioning. Any functional unit you like. On any level of abstraction. It´s like wielding a magic wand, e.g. "Let there be a functional unit for command line parameter validation!"

There might be one or many control statements needed to implement a functional unit. But let that not leak into your design; don´t anticipate so much. Instead label your functional units with a domain specific phrase. One that describes what is happening, not how. That makes for a declarative DSL consisting of many words and phrases that are descriptive - and even re-usable.

Generalization: 2-dimensional flows

The result of Flow Design then is a flow with possibly many alternatives. A flow that branches like a river does. I call that a 2-dimensional flow because it´s not just one sequence of processing steps (1D), but many, in parallel (2D).

image

2D flows are data flows like 1D flows. There is nothing new to them in terms of parallel processing. Whether two processing steps are wired-up after one another or as alternatives does not require them to be implemented using multiple threads. It´s possible to do that. Flow Design makes that easier because its data flow is oblivious to control flow.

So don´t rush to find learn about Actor frameworks or async/await in C# because you want to apply Flow Design to your problems. Such technologies are orthogonal to Flow Design. For a start just rely on ordinary functions to implement processing steps. That does not diminish the usefulness of functional design.

What does 2-dimensionality mean? It means, data can flow along alternative paths through the network of nodes. Here are the paths for the above 2D flow:

image

Which does not mean, it´s one or the other. Data can flow along many paths at the same time. In any case conceptually at least, but also (almost) truely at runtime, if you choose to employ multi-threading of some sort. It need not be "either this path or that", it can be "this path as well as that".

But don´t let that confuse you right now. Without a tangible problem demanding for that kind of sophisticated flow design it´s pretty abstract musings. In practice this is largely no problem. Most flows are pretty straightforward.

Just keep in mind: this is data flow, not control flow. That means it´s unidirectional data exchange between independent functional units who don´t know anything about each other. They just happen to offer certain behavior which expresses itself as producing certain output or some side effect upon certain input.

Similar flows flowing back together

Data flows cannot only be split into branches, they can also flow back into each other or be joined.

Think of the famous Fizz Buzz kata: Numbers in a range, e.g. 1..100 is to be output in a special way. If can be devided by 3 "Fizz" should be written, if it´s devidable by 5 "Buzz" should be written, and if it can devided by 3 and 5 "FizzBuzz" should be written. Any other number if output as is.

Usually this kata is used to practice TDD. But of course it can also be tackled with Flow Design, although it´s scope is very narrow and the solution thus might feel a little clumsy. Basically it´s a small algorithmic problem. So Flow Design is almost overkill.

On the other hand it´s perfect to illustrate branching and flowing back.

The task is, to implement Fizz Buzz as a function like this: void FizzBuzz(int first, int last). For a given range of numbers the translations should be printed to standard output.

What´s to be done? What are the building blocks for this functionality? Here´s the result of my brainstorming:

  • Print numbers or their translations.
  • Translate number
    • First classify number
    • Then convert it
  • Number generation
  • Check range. If it´s an invalid range, throw an exception.

Notice how fine grained these processing steps are. Before I start coding I´m always eager to determine the different responsibilities. That´s one of the tasks of any design: separate aspects, responsibilities, concerns.

Printing numbers certainly is different from all else. It´s about calling an API, it´s communication with the environment, whereas the other processing steps belong to the Fizz Buzz domain.

Validation also is different from translation, isn´t it? Translation rules could change. That should no affect the validation function.

Also classification rules chould change. That should not affect the functions for converting a certain class of numbers. As well as the other way around.

"Seeing responsibilities" is one of the "arts" of software development. It can be trained, but except for some hard and fast rules in the end it remains a quite creative act. Be prepared to revise your decisions. Also be prepared for dissent in your team. But with regular reflection you´ll master this art.

Here now my Flow Design for the above bullet points:

image

Let me point out a couple of things:

  • Note that multiple values can flow as data at once, e.g. (first, last). That´s tuples. Passing them in as input is easy: they map to a list of formal function parameters. But how generate them as output? There are various options depending on the programming language you use.
  • Streams are used again to signify optional output. For each number data on only one port will flow out of classification.
  • The streams flowing into the translation steps produce an output stream. That´s the right thing to do here. In other scenarios, though, an input stream could result in just one output value. Think of aggregation.

Like I said, for the problem at hand this might be a bit overkill. A quite elaborate flow for such simple functionality. On the other hand that´s perfect: The problem domain is easy to undestand so we can focus on the features of Flow Design and their translation into code.

Here you see how it´s possible to have many output ports on a processing step and how many branches can flow back into one.

The visual notation makes that very easy. But how does it look in code? Will it still be readily understandable?

Let´s start with some of the processing steps:

image

Each of the steps is very small, very focused, very easy to understand. I think, that´s a good thing. Functions should be small, shouldn´t they? Some say no more than 10 LOC, others say 40 LOC or "a screenfull of code". In any case Flow Design very naturally leads to small functions. Don´t wait for refactoring to downsize your functions. Do it right from the beginning. You save yourself quite some refactoring trouble.

My favorite function is Classify_number(), you know. Because it´s so different from the usual Fizz Buzz implementations. Here it truely has a single responsibility: It´s the place where numbers are analyzed. It´s where there Fizz Buzz rule is located, which says, numbers must not all be treated the same.

Fizz Buzz originally is a drinking game. Who fails at "counting" correctly has drink some more - which makes it even harder to "count". The main mental effort goes into checking if a number needs translation. It´s about math - which is not easy for everyone even when sober ;-) And right this checking is represented by Classify_number(). No number generation, no translation, just checking.

That´s also the reason, why I did not bother to apply the Single Level of Abstraction (SLA) principle. I did not refactor the conditions out into their own functions but left them in there, even with a small duplication. Still the function can be tested very, very easily.

And now for the main function of the solution where the process is assembled from the functional units:

image

This might look a bit strange to you. But try to see through this. Try to see how systematic this translation is. And in the end you´ll see how the data flows even in the code. From that you then can re-generate the diagram. The code is the design. And when you cange the code according to the PoMO it will stay in sync with the design because it is just a "serialization" of a flow.

If you look closely, though, you might spot a seeming deviation from the design. Print() is repeated in every branch instead of calling the function just once. But in fact it´s not a deviation but a detail of the way several streams need to be joined back together into one. See it not as several calls of a function, but as a single point. It´s just 1 name, 1 function and thus represents the 1 point circled in the Flow Design.

Joining dissimilar flows

Here´s another scenario where branching helps - but how those branches flow back together is different.

The task is to write a function that formats CSV data. Its signature looks like this: string FormatCsv(string csv).

The input data are CSV records, e.g.

Name;Age;City
Peter;26;Hamburg
Paul;45;London
Mary;38;Copenhagen

And the output is supposed to look like this:

Name |Age|City
-----+---+----------
Peter|26 |Hamburg
Paul |45 |London
Mary |38 |Copenhagen

The function generates an ASCII table from the raw data. The header is separated from the data records. And the columns are spaced to accomodate the longest value in either header or data records.

What are the aspects, the features of this functionality?

  • Determine column width
  • Parse input
  • Format header
  • Format data records - which should work like formatting the header
  • Format separator - which looks quite different from formatted data
  • Build the whole table from the formatted data

The order of these processing steps is simple. And as it turns out, some processing can be done in parallel:

image

Once the column widths have been determined, formatting the data and formatting the separator is independent of each other. That´s why I branched the flow and put the Format... processing steps in parallel.

Notice the asterisk in (csvRecord*) or (colWidth*). It denotes a list of values in an abstract manner. Whether you implement the list as an array or some list type or IEnumerable in .NET is of no concern to the design. Compare this to the asterisk outside the brackets denoting a stream of single values: (int*) stands for a list (when data flows it contains multiple values), (int)* stands for a stream (data flows multiple times containing a single value).

Formatting the separator just takes in the column width values. But formatting the records also takes in the records. Notice the "|" before the data description. It means "the following is what really flows into the next functional unit". It´s used in cases where upstream different data is output than downstream requires as input.

Determine col width outputs (colWidth*), but Format records requires (csvRecord, colWidth). That´s expressed by (colWidth) | (csvRecord,colWidth*) on the arrow pointing from Determine... to Format records.

This means, a flow defines a context. Within this context data can be "re-used". In this case the csvRecord* coming out of Parse is used again for formatting. (In code this is easy to achieve if a flow is put together in a single function. Then data can be assigned to local variables.)

Most importantly, though, this Flow Design sports a join. The join is a special functional unit. It takes n input flows and produces 1 output flow. The data of the output is a tuple combining data from all inputs.

The join waits for data to arrive on all inputs. And it outputs a tuple, whenever an input changes. In this case, though, once output was generated, the join clears its inputs. So for the next output new input has to arrive on both input ports. That´s called an auto-reset join.1

Sounds complicated? Maybe. But in the end it´s real simple. As you see in the implementation a join - even though being a functional unit of its own in the flow - does not require an extra function:

image

A simple function call with n parameters will do most often to bring together several branches - at least as long as you don´t resort to real parallel processing in those branches.

That´s why sometimes I simplify the join like this:

image

That way it does no longer look so "massive". It´s more a part of the downstream processing step.

For the remaining code of the CSV formatter see the implementation in the accompanying GitHub repository.

In closing

I hope I was able to instill some faith in you that Flow Design is rich enough to model solutions to real problems. Even though it´s not a full blown programming language it allows you to express "processes" of all sorts to deliver on the all the functional and many quality requirements of your customers.

1D and 2D flows are declarative expressions of "how things work" once control enters a software through an Entry Point.

Mutually oblivious functional units are all you need to avoid many of the pitfalls of programming usually leading to dirty code.

But wait! There´s more! ;-) You sure want to know how to scale those flows to build arbitrarily large processes.


  1. You might think, if there is an auto-reset join, there could be a manual-reset join, too. And you´re right. So far, though, I´ve found that to be of rare use. That´s why I´m not going into detail on that here.

The Incremental Architect’s Napkin - #5 - Design functions for extensibility and readability

The functionality of programs is entered via Entry Points. So what we´re talking about when designing software is a bunch of functions handling the requests represented by and flowing in through those Entry Points.

Designing software thus consists of at least three phases:

  1. Analyzing the requirements to find the Entry Points and their signatures
  2. Designing the functionality to be executed when those Entry Points get triggered
  3. Implementing the functionality according to the design aka coding

I presume, you´re familiar with phase 1 in some way. And I guess you´re proficient in implementing functionality in some programming language.

But in my experience developers in general are not experienced in going through an explicit phase 2. “Designing functionality? What´s that supposed to mean?” you might already have thought.

Here´s my definition: To design functionality (or functional design for short) means thinking about… well, functions. You find a solution for what´s supposed to happen when an Entry Point gets triggered in terms of functions. A conceptual solution that is, because those functions only exist in your head (or on paper) during this phase. But you may have guess that, because it´s “design” not “coding”.

And here is, what functional design is not: It´s not about logic. Logic is expressions (e.g. +, -, && etc.) and control statements (e.g. if, switch, for, while etc.). Also I consider calling external APIs as logic. It´s equally basic. It´s what code needs to do in order to deliver some functionality or quality.

Logic is what´s doing that needs to be done by software. Transformations are either done through expressions or API-calls. And then there is alternative control flow depending on the result of some expression. Basically it´s just jumps in Assembler, sometimes to go forward (if, switch), sometimes to go backward (for, while, do).

But calling your own function is not logic. It´s not necessary to produce any outcome. Functionality is not enhanced by adding functions (subroutine calls) to your code. Nor is quality increased by adding functions. No performance gain, no higher scalability etc. through functions.

Functions are not relevant to functionality. Strange, isn´t it.

What they are important for is security of investment. By introducing functions into our code we can become more productive (re-use) and can increase evolvability (higher unterstandability, easier to keep code consistent).

That´s no small feat, however. Evolvable code can hardly be overestimated. That´s why to me functional design is so important. It´s at the core of software development.

To sum this up: Functional design is on a level of abstraction above (!) logical design or algorithmic design. Functional design is only done until you get to a point where each function is so simple you are very confident you can easily code it.

Functional design an logical design (which mostly is coding, but can also be done using pseudo code or flow charts) are complementary. Software needs both. If you start coding right away you end up in a tangled mess very quickly. Then you need back out through refactoring. Functional design on the other hand is bloodless without actual code. It´s just a theory with no experiments to prove it.

But how to do functional design?

An example of functional design

Let´s assume a program to de-duplicate strings. The user enters a number of strings separated by commas, e.g. a, b, a, c, d, b, e, c, a. And the program is supposed to clear this list of all doubles, e.g. a, b, c, d, e.

There is only one Entry Point to this program: the user triggers the de-duplication by starting the program with the string list on the command line

C:\>deduplicate "a, b, a, c, d, b, e, c, a"
a, b, c, d, e

…or by clicking on a GUI button.

image

This leads to the Entry Point function to get called. It´s the program´s main function in case of the batch version or a button click event handler in the GUI version. That´s the physical Entry Point so to speak. It´s inevitable.

What then happens is a three step process:

  1. Transform the input data from the user into a request.
  2. Call the request handler.
  3. Transform the output of the request handler into a tangible result for the user.

Or to phrase it a bit more generally:

  1. Accept input.
  2. Transform input into output.
  3. Present output.

This does not mean any of these steps requires a lot of effort. Maybe it´s just one line of code to accomplish it. Nevertheless it´s a distinct step in doing the processing behind an Entry Point. Call it an aspect or a responsibility - and you will realize it most likely deserves a function of its own to satisfy the Single Responsibility Principle (SRP).

Interestingly the above list of steps is already functional design. There is no logic, but nevertheless the solution is described - albeit on a higher level of abstraction than you might have done yourself.

But it´s still on a meta-level. The application to the domain at hand is easy, though:

  1. Accept string list from command line
  2. De-duplicate
  3. Present de-duplicated strings on standard output

And this concrete list of processing steps can easily be transformed into code:

static void Main(string[] args)
{
    var input = Accept_string_list(args);
    var output = Deduplicate(input);
    Present_deduplicated_string_list(output);
}

Instead of a big problem there are three much smaller problems now. If you think each of those is trivial to implement, then go for it. You can stop the functional design at this point.

But maybe, just maybe, you´re not so sure how to go about with the de-duplication for example. Then just implement what´s easy right now, e.g.

private static string Accept_string_list(string[] args)
{
    return args[0];
}

private static void 
        Present_deduplicated_string_list(
            string[] output)
{
    var line = string.Join(", ", output);
    Console.WriteLine(line);
}

Accept_string_list() contains logic in the form of an API-call. Present_deduplicated_string_list() contains logic in the form of an expression and an API-call.

And then repeat the functional design for the remaining processing step. What´s left is the domain logic: de-duplicating a list of strings. How should that be done?

Without any logic at our disposal during functional design you´re left with just functions. So which functions could make up the de-duplication? Here´s a suggestion:

  • De-duplicate
  • Parse the input string into a true list of strings.
  • Register each string in a dictionary/map/set. That way duplicates get cast away.
  • Transform the data structure into a list of unique strings.

Processing step 2 obviously was the core of the solution. That´s where real creativity was needed. That´s the core of the domain. But now after this refinement the implementation of each step is easy again:

private static string[] Parse_string_list(string input)
{
    return input.Split(',')
                .Select(s => s.Trim())
                .ToArray();
}

private static Dictionary<string,object> 
        Compile_unique_strings(string[] strings)
{
    return strings.Aggregate(
            new Dictionary<string, object>(),
            (agg, s) => { 
                agg[s] = null;
                return agg;
            });
}

private static string[] Serialize_unique_strings(
               Dictionary<string,object> dict)
{
    return dict.Keys.ToArray();
}

With these three additional functions Main() now looks like this:

static void Main(string[] args)
{
    var input = Accept_string_list(args);

    var strings = Parse_string_list(input);
    var dict = Compile_unique_strings(strings);
    var output = Serialize_unique_strings(dict);

    Present_deduplicated_string_list(output);
}

I think that´s very understandable code: just read it from top to bottom and you know how the solution to the problem works. It´s a mirror image of the initial design:

  1. Accept string list from command line
  2. Parse the input string into a true list of strings.
  3. Register each string in a dictionary/map/set. That way duplicates get cast away.
  4. Transform the data structure into a list of unique strings.
  5. Present de-duplicated strings on standard output

You can even re-generate the design by just looking at the code. Code and functional design thus are always in sync - if you follow some simple rules. But about that later.

And as a bonus: all the functions making up the process are small - which means easy to understand, too.

So much for an initial concrete example. Now it´s time for some theory. Because there is method to this madness ;-) The above has only scratched the surface.

Introducing Flow Design

Functional design starts with a given function, the Entry Point. Its goal is to describe the behavior of the program when the Entry Point is triggered using a process, not an algorithm.

An algorithm consists of logic, a process on the other hand consists just of steps or stages. Each processing step transforms input into output or a side effect. Also it might access resources, e.g. a printer, a database, or just memory. Processing steps thus can rely on state of some sort. This is different from Functional Programming, where functions are supposed to not be stateful and not cause side effects.[1]

In its simplest form a process can be written as a bullet point list of steps, e.g.

  • Get data from user
  • Output result to user
  • Transform data
  • Parse data
  • Map result for output

Such a compilation of steps - possibly on different levels of abstraction - often is the first artifact of functional design. It can be generated by a team in an initial design brainstorming.

Next comes ordering the steps. What should happen first, what next etc.?

  1. Get data from user
  2. Parse data
  3. Transform data
  4. Map result for output
  5. Output result to user

That´s great for a start into functional design. It´s better than starting to code right away on a given function using TDD.

Please get me right: TDD is a valuable practice. But it can be unnecessarily hard if the scope of a functionn is too large. But how do you know beforehand without investing some thinking? And how to do this thinking in a systematic fashion?

My recommendation: For any given function you´re supposed to implement first do a functional design. Then, once you´re confident you know the processing steps - which are pretty small - refine and code them using TDD. You´ll see that´s much, much easier - and leads to cleaner code right away. For more information on this approach I call “Informed TDD” read my book of the same title.

Thinking before coding is smart. And writing down the solution as a bunch of functions possibly is the simplest thing you can do, I´d say. It´s more according to the KISS (Keep It Simple, Stupid) principle than returning constants or other trivial stuff TDD development often is started with.

So far so good. A simple ordered list of processing steps will do to start with functional design. As shown in the above example such steps can easily be translated into functions. Moving from design to coding thus is simple.

However, such a list does not scale. Processing is not always that simple to be captured in a list. And then the list is just text. Again. Like code. That means the design is lacking visuality. Textual representations need more parsing by your brain than visual representations. Plus they are limited in their “dimensionality”: text just has one dimension, it´s sequential. Alternatives and parallelism are hard to encode in text.

In addition the functional design using numbered lists lacks data. It´s not visible what´s the input, output, and state of the processing steps.

That´s why functional design should be done using a lightweight visual notation. No tool is necessary to draw such designs. Use pen and paper; a flipchart, a whiteboard, or even a napkin is sufficient.

Visualizing processes

The building block of the functional design notation is a functional unit. I mostly draw it like this:

image

Something is done, it´s clear what goes in, it´s clear what comes out, and it´s clear what the processing step requires in terms of state or hardware.

Whenever input flows into a functional unit it gets processed and output is produced and/or a side effect occurs. Flowing data is the driver of something happening. That´s why I call this approach to functional design Flow Design.

It´s about data flow instead of control flow. Control flow like in algorithms is of no concern to functional design. Thinking about control flow simply is too low level. Once you start with control flow you easily get bogged down by tons of details.

That´s what you want to avoid during design. Design is supposed to be quick, broad brush, abstract. It should give overview.

But what about all the details? As Robert C. Martin rightly said: “Programming is abot detail”.

Detail is a matter of code. Once you start coding the processing steps you designed you can worry about all the detail you want.

Functional design does not eliminate all the nitty gritty. It just postpones tackling them. To me that´s also an example of the SRP. Function design has the responsibility to come up with a solution to a problem posed by a single function (Entry Point). And later coding has the responsibility to implement the solution down to the last detail (i.e. statement, API-call).

TDD unfortunately mixes both responsibilities. It´s just coding - and thereby trying to find detailed implementations (green phase) plus getting the design right (refactoring). To me that´s one reason why TDD has failed to deliver on its promise for many developers.

Using functional units as building blocks of functional design processes can be depicted very easily. Here´s the initial process for the example problem:

image

For each processing step draw a functional unit and label it. Choose a verb or an “action phrase” as a label, not a noun. Functional design is about activities, not state or structure.

Then make the output of an upstream step the input of a downstream step. Finally think about the data that should flow between the functional units.

Write the data above the arrows connecting the functional units in the direction of the data flow. Enclose the data description in brackets. That way you can clearly see if all flows have already been specified.

Empty brackets mean “no data is flowing”, but nevertheless a signal is sent.

A name like “list” or “strings” in brackets describes the data content. Use lower case labels for that purpose.

A name starting with an upper case letter like “String” or “Customer” on the other hand signifies a data type.

If you like, you also can combine descriptions with data types by separating them with a colon, e.g. (list:string) or (strings:string[]).

But these are just suggestions from my practice with Flow Design. You can do it differently, if you like. Just be sure to be consistent.

Flows wired-up in this manner I call one-dimensional (1D). Each functional unit just has one input and/or one output.

A functional unit without an output is possible. It´s like a black hole sucking up input without producing any output. Instead it produces side effects.

A functional unit without an input, though, does make much sense. When should it start to work? What´s the trigger? That´s why in the above process even the first processing step has an input.

If you like, view such 1D-flows as pipelines. Data is flowing through them from left to right. But as you can see, it´s not always the same data. It get´s transformed along its passage: (args) becomes a (list) which is turned into (strings).

The Principle of Mutual Oblivion

A very characteristic trait of flows put together from function units is: no functional units knows another one. They are all completely independent of each other.

Functional units don´t know where their input is coming from (or even when it´s gonna arrive). They just specify a range of values they can process. And they promise a certain behavior upon input arriving.

Also they don´t know where their output is going. They just produce it in their own time independent of other functional units. That means at least conceptually all functional units work in parallel.

Functional units don´t know their “deployment context”. They now nothing about the overall flow they are place in. They are just consuming input from some upstream, and producing output for some downstream.

That makes functional units very easy to test. At least as long as they don´t depend on state or resources.

I call this the Principle of Mutual Oblivion (PoMO). Functional units are oblivious of others as well as an overall context/purpose. They are just parts of a whole focused on a single responsibility.

How the whole is built, how a larger goal is achieved, is of no concern to the single functional units.

By building software in such a manner, functional design interestingly follows nature. Nature´s building blocks for organisms also follow the PoMO. The cells forming your body do not know each other.

Take a nerve cell “controlling” a muscle cell for example:[2]

image

The nerve cell does not know anything about muscle cells, let alone the specific muscel cell it is “attached to”. Likewise the muscle cell does not know anything about nerve cells, let a lone a specific nerve cell “attached to” it. Saying “the nerve cell is controlling the muscle cell” thus only makes sense when viewing both from the outside. “Control” is a concept of the whole, not of its parts. Control is created by wiring-up parts in a certain way.

Both cells are mutually oblivious. Both just follow a contract. One produces Acetylcholine (ACh) as output, the other consumes ACh as input. Where the ACh is going, where it´s coming from neither cell cares about.

Million years of evolution have led to this kind of division of labor. And million years of evolution have produced organism designs (DNA) which lead to the production of these different cell types (and many others) and also to their co-location. The result: the overall behavior of an organism.

How and why this happened in nature is a mystery. For our software, though, it´s clear: functional and quality requirements needs to be fulfilled. So we as developers have to become “intelligent designers” of “software cells” which we put together to form a “software organism” which responds in satisfying ways to triggers from it´s environment.

My bet is: If nature gets complex organisms working by following the PoMO, who are we to not apply this recipe for success to our much simpler “machines”?

So my rule is: Wherever there is functionality to be delivered, because there is a clear Entry Point into software, design the functionality like nature would do it. Build it from mutually oblivious functional units.

That´s what Flow Design is about. In that way it´s even universal, I´d say. Its notation can also be applied to biology:

image

Never mind labeling the functional units with nouns. That´s ok in Flow Design. You´ll do that occassionally for functional units on a higher level of abstraction or when their purpose is close to hardware.

Getting a cockroach to roam your bedroom takes 1,000,000 nerve cells (neurons). Getting the de-duplication program to do its job just takes 5 “software cells” (functional units). Both, though, follow the same basic principle.

Translating functional units into code

Moving from functional design to code is no rocket science. In fact it´s straightforward. There are two simple rules:

  • Translate an input port to a function.
  • Translate an output port either to a return statement in that function or to a function pointer visible to that function.

image

The simplest translation of a functional unit is a function. That´s what you saw in the above example. Functions are mutually oblivious. That why Functional Programming likes them so much. It makes them composable. Which is the reason, nature works according to the PoMO.

Let´s be clear about one thing: There is no dependency injection in nature. For all of an organism´s complexity no DI container is used. Behavior is the result of smooth cooperation between mutually oblivious building blocks.

Functions will often be the adequate translation for the functional units in your designs. But not always. Take for example the case, where a processing step should not always produce an output. Maybe the purpose is to filter input.

image

Here the functional unit consumes words and produces words. But it does not pass along every word flowing in. Some words are swallowed.

Think of a spell checker. It probably should not check acronyms for correctness. There are too many of them. Or words with no more than two letters. Such words are called “stop words”.

In the above picture the optionality of the output is signified by the astrisk outside the brackets. It means: Any number of (word) data items can flow from the functional unit for each input data item. It might be none or one or even more. This I call a stream of data.

Such behavior cannot be translated into a function where output is generated with return. Because a function always needs to return a value.

So the output port is translated into a function pointer or continuation which gets passed to the subroutine when called:[3]

void filter_stop_words(
       string word,
       Action<string> onNoStopWord) {
  if (...check if not a stop word...)
    onNoStopWord(word);
}

If you want to be nitpicky you might call such a function pointer parameter an injection. And technically you´re right. Conceptually, though, it´s not an injection. Because the subroutine is not functionally dependent on the continuation.

Firstly continuations are procedures, i.e. subroutines without a return type. Remember: Flow Design is about unidirectional data flow.

Secondly the name of the formal parameter is chosen in a way as to not assume anything about downstream processing steps. onNoStopWord describes a situation (or event) within the functional unit only.

Translating output ports into function pointers helps keeping functional units mutually oblivious in cases where output is optional or produced asynchronically.

Either pass the function pointer to the function upon call. Or make it global by putting it on the encompassing class. Then it´s called an event. In C# that´s even an explicit feature.

class Filter {
  public void filter_stop_words(
                string word) {
    if (...check if not a stop word...)
      onNoStopWord(word);
  }

  public event Action<string> onNoStopWord;
}

When to use a continuation and when to use an event dependens on how a functional unit is used in flows and how it´s packed together with others into classes. You´ll see examples further down the Flow Design road.

Another example of 1D functional design

Let´s see Flow Design once more in action using the visual notation. How about the famous word wrap kata? Robert C. Martin has posted a much cited solution including an extensive reasoning behind his TDD approach. So maybe you want to compare it to Flow Design.

The function signature given is:

string WordWrap(string text, int maxLineLength) 
{...}

That´s not an Entry Point since we don´t see an application with an environment and users. Nevertheless it´s a function which is supposed to provide a certain functionality.

The text passed in has to be reformatted. The input is a single line of arbitrary length consisting of words separated by spaces. The output should consist of one or more lines of a maximum length specified.

If a word is longer than a the maximum line length it can be split in multiple parts each fitting in a line.

Flow Design

Let´s start by brainstorming the process to accomplish the feat of reformatting the text. What´s needed?

  • Words need to be assembled into lines
  • Words need to be extracted from the input text
  • The resulting lines need to be assembled into the output text
  • Words too long to fit in a line need to be split

Does sound about right? I guess so. And it shows a kind of priority. Long words are a special case. So maybe there is a hint for an incremental design here. First let´s tackle “average words” (words not longer than a line).

Here´s the Flow Design for this increment:

image

The the first three bullet points turned into functional units with explicit data added.

As the signature requires a text is transformed into another text. See the input of the first functional unit and the output of the last functional unit.

In between no text flows, but words and lines. That´s good to see because thereby the domain is clearly represented in the design. The requirements are talking about words and lines and here they are.

But note the asterisk! It´s not outside the brackets but inside. That means it´s not a stream of words or lines, but lists or sequences. For each text a sequence of words is output. For each sequence of words a sequence of lines is produced.

The asterisk is used to abstract from the concrete implementation. Like with streams. Whether the list of words gets implemented as an array or an IEnumerable is not important during design. It´s an implementation detail.

Does any processing step require further refinement? I don´t think so. They all look pretty “atomic” to me. And if not… I can always backtrack and refine a process step using functional design later once I´ve gained more insight into a sub-problem.

Implementation

The implementation is straightforward as you can imagine. The processing steps can all be translated into functions. Each can be tested easily and separately. Each has a focused responsibility.

image

And the process flow becomes just a sequence of function calls:

image

Easy to understand. It clearly states how word wrapping works - on a high level of abstraction.

And it´s easy to evolve as you´ll see.

Flow Design - Increment 2

So far only texts consisting of “average words” are wrapped correctly. Words not fitting in a line will result in lines too long.

Wrapping long words is a feature of the requested functionality. Whether it´s there or not makes a difference to the user. To quickly get feedback I decided to first implement a solution without this feature. But now it´s time to add it to deliver the full scope.

Fortunately Flow Design automatically leads to code following the Open Closed Principle (OCP). It´s easy to extend it - instead of changing well tested code. How´s that possible?

Flow Design allows for extension of functionality by inserting functional units into the flow. That way existing functional units need not be changed. The data flow arrow between functional units is a natural extension point. No need to resort to the Strategy Pattern. No need to think ahead where extions might need to be made in the future.

I just “phase in” the remaining processing step:

image

Since neither Extract words nor Reformat know of their environment neither needs to be touched due to the “detour”. The new processing step accepts the output of the existing upstream step and produces data compatible with the existing downstream step.

Implementation - Increment 2

A trivial implementation checking the assumption if this works does not do anything to split long words. The input is just passed on:

image

Note how clean WordWrap() stays. The solution is easy to understand. A developer looking at this code sometime in the future, when a new feature needs to be build in, quickly sees how long words are dealt with.

Compare this to Robert C. Martin´s solution:[4]

image

How does this solution handle long words? Long words are not even part of the domain language present in the code. At least I need considerable time to understand the approach.

Admittedly the Flow Design solution with the full implementation of long word splitting is longer than Robert C. Martin´s. At least it seems. Because his solution does not cover all the “word wrap situations” the Flow Design solution handles. Some lines would need to be added to be on par, I guess.

But even then… Is a difference in LOC that important as long as it´s in the same ball park? I value understandability and openness for extension higher than saving on the last line of code. Simplicity is not just less code, it´s also clarity in design.

But don´t take my word for it. Try Flow Design on larger problems and compare for yourself. What´s the easier, more straightforward way to clean code? And keep in mind: You ain´t seen all yet ;-) There´s more to Flow Design than described in this chapter.

In closing

I hope I was able to give you a impression of functional design that makes you hungry for more. To me it´s an inevitable step in software development. Jumping from requirements to code does not scale. And it leads to dirty code all to quickly.

Some thought should be invested first. Where there is a clear Entry Point visible, it´s functionality should be designed using data flows. Because with data flows abstraction is possible. For more background on why that´s necessary read my blog article here.

For now let me point out to you - if you haven´t already noticed - that Flow Design is a general purpose declarative language. It´s “programming by intention” (Shalloway et al.).

Just write down how you think the solution should work on a high level of abstraction. This breaks down a large problem in smaller problems. And by following the PoMO the solutions to those smaller problems are independent of each other. So they are easy to test. Or you could even think about getting them implemented in parallel by different team members.

Flow Design not only increases evolvability, but also helps becoming more productive. All team members can participate in functional design. This goes beyon collective code ownership. We´re talking collective design/architecture ownership. Because with Flow Design there is a common visual language to talk about functional design - which is the foundation for all other design activities.

 

PS: If you like what you read, consider getting my ebook “The Incremental Architekt´s Napkin”. It´s where I compile all the articles in this series for easier reading.


  1. I like the strictness of Function Programming - but I also find it quite hard to live by. And it certainly is not what millions of programmers are used to. Also to me it seems, the real world is full of state and side effects. So why give them such a bad image? That´s why functional design takes a more pragmatic approach. State and side effects are ok for processing steps - but be sure to follow the SRP. Don´t put too much of it into a single processing step.

  2. Image taken from www.physioweb.org

  3. My code samples are written in C#. C# sports typed function pointers called delegates. Action is such a function pointer type matching functions with signature void someName(T t). Other languages provide similar ways to work with functions as first class citizens - even Java now in version 8. I trust you find a way to map this detail of my translation to your favorite programming language. I know it works for Java, C++, Ruby, JavaScript, Python, Go. And if you´re using a Functional Programming language it´s of course a no brainer.

  4. Taken from his blog post “The Craftsman 62, The Dark Path”.

Abstracting functionality

What is more important than data? Functionality. Yes, I strongly believe we should switch to a functionality over data mindset in programming. Or actually switch back to it.

Focus on functionality

Functionality once was at the core of software development. Back when algorithms were the first thing you heard about in CS classes. Sure, data structures, too, were important - but always from the point of view of algorithms. (Niklaus Wirth gave one of his books the title “Algorithms + Data Structures” instead of “Data Structures + Algorithms” for a reason.)

The reason for the focus on functionality? Firstly, because software was and is about doing stuff. Secondly because sufficient performance was hard to achieve, and only thirdly memory efficiency.

But then hardware became more powerful. That gave rise to a new mindset: object orientation. And with it functionality was devalued. Data took over its place as the most important aspect. Now discussions revolved around structures motivated by data relationships. (John Beidler gave his book the title “Data Structures and Algorithms: An Object Oriented Approach” instead of the other way around for a reason.)

Sure, this data could be embellished with functionality. But nevertheless functionality was second.

imageWhen you look at (domain) object models what you mostly find is (domain) data object models. The common object oriented approach is: data aka structure over functionality. This is true even for the most modern modeling approaches like Domain Driven Design. Look at the literature and what you find is recommendations on how to get data structures right: aggregates, entities, value objects.

I´m not saying this is what object orientation was invented for. But I´m saying that´s what I happen to see across many teams now some 25 years after object orientation became mainstream through C++, Delphi, and Java.

But why should we switch back? Because software development cannot become truly agile with a data focus. The reason for that lies in what customers need first: functionality, behavior, operations.

To be clear, that´s not why software is built. The purpose of software is to be more efficient than the alternative. Money mainly is spent to get a certain level of quality (e.g. performance, scalability, security etc.). But without functionality being present, there is nothing to work on the quality of.

What customers want is functionality of a certain quality. ASAP. And tomorrow new functionality needs to be added, existing functionality needs to be changed, and quality needs to be increased.

No customer ever wanted data or structures.

Of course data should be processed. Data is there, data gets generated, transformed, stored. But how the data is structured for this to happen efficiently is of no concern to the customer.

Ask a customer (or user) whether she likes the data structured this way or that way. She´ll say, “I don´t care.” But ask a customer (or user) whether he likes the functionality and its quality this way or that way. He´ll say, “I like it” (or “I don´t like it”).

Build software incrementally

From this very natural focus of customers and users on functionality and its quality follows we should develop software incrementally. That´s what Agility is about.

Deliver small increments quickly and often to get frequent feedback. That way less waste is produced, and learning can take place much easier (on the side of the customer as well as on the side of developers).

An increment is some added functionality or quality of functionality.[1]

So as it turns out, Agility is about functionality over whatever. But software developers’ thinking is still stuck in the object oriented mindset of whatever over functionality. Bummer. I guess that (at least partly) explains why Agility always hits a glass ceiling in projects. It´s a clash of mindsets, of cultures.

Driving software development by demanding small increases in functionality runs against thinking about software as growing (data) structures sprinkled with functionality. (Excuse me, if this sounds a bit broad-brush. But you get my point.)

The need for abstraction

In the end there need to be data structures. Of course. Small and large ones. The phrase functionality over data does not deny that. It´s not functionality instead of data or something. It´s just over, i.e. functionality should be thought of first. It´s a tad more important. It´s what the customer wants.

That´s why we need a way to design functionality. Small and large. We need to be able to think about functionality before implementing it. We need to be able to reason about it among team members. We need to be able to communicate our mental models of functionality not just by speaking about them, but also on paper. Otherwise reasoning about it does not scale.

imageWe learned thinking about functionality in the small using flow charts, Nassi-Shneiderman diagrams, pseudo code, or UML sequence diagrams.

That´s nice and well. But it does not scale. You can use these tools to describe manageable algorithms. But it does not work for the functionality triggered by pressing the “1-Click Order” on an amazon product page for example.

There are several reasons for that, I´d say.

Firstly, the level of abstraction over code is negligible. It´s essentially non-existent. Drawing a flow chart or writing pseudo code or writing actual code is very, very much alike. All these tools are about control flow like code is.[2]

In addition all tools are computationally complete. They are about logic which is expressions and especially control statements. Whatever you code in Java you can fully (!) describe using a flow chart.

And then there is no data. They are about control flow and leave out the data altogether. Thus data mostly is assumed to be global. That´s shooting yourself in the foot, as I hope you agree.

Even if it´s functionality over data that does not mean “don´t think about data”. Right to the contrary! Functionality only makes sense with regard to data. So data needs to be in the picture right from the start - but it must not dominate the thinking. The above tools fail on this.

Bottom line: So far we´re unable to reason in a scalable and abstract manner about functionality.

That´s why programmers are so driven to start coding once they are presented with a problem. Programming languages are the only tool they´ve learned to use to reason about functional solutions.

imageOr, well, there might be exceptions. Mathematical notation and SQL may have come to your mind already. Indeed they are tools on a higher level of abstraction than flow charts etc. That´s because they are declarative and not computationally complete. They leave out details - in order to deliver higher efficiency in devising overall solutions.

We can easily reason about functionality using mathematics and SQL. That´s great. Except for that they are domain specific languages. They are not general purpose. (And they don´t scale either, I´d say.) Bummer.

So to be more precise we need a scalable general purpose tool on a higher than code level of abstraction not neglecting data.

Enter: Flow Design.

Abstracting functionality using data flows

I believe the solution to the problem of abstracting functionality lies in switching from control flow to data flow.

Data flow very naturally is not about logic details anymore. There are no expressions and no control statements anymore. There are not even statements anymore. Data flow is declarative by nature.

image

With data flow we get rid of all the limiting traits of former approaches to modeling functionality.

In addition, nomen est omen, data flows include data in the functionality picture.

With data flows, data is visibly flowing from processing step to processing step. Control is not flowing. Control is wherever it´s needed to process data coming in.

That´s a crucial difference and needs some rewiring in your head to be fully appreciated.[2]

Since data flows are declarative they are not the right tool to describe algorithms, though, I´d say. With them you don´t design functionality on a low level. During design data flow processing steps are black boxes. They get fleshed out during coding.

Data flow design thus is more coarse grained than flow chart design. It starts on a higher level of abstraction - but then is not limited. By nesting data flows indefinitely you can design functionality of any size, without losing sight of your data.

image

Data flows scale very well during design. They can be used on any level of granularity. And they can easily be depicted. Communicating designs using data flows is easy and scales well, too.

The result of functional design using data flows is not algorithms (too low level), but processes. Think of data flows as descriptions of industrial production lines. Data as material runs through a number of processing steps to be analyzed, enhances, transformed.

On the top level of a data flow design might be just one processing step, e.g. “execute 1-click order”. But below that are arbitrary levels of flows with smaller and smaller steps.

That´s not layering as in “layered architecture”, though. Rather it´s a stratified design à la Abelson/Sussman.

Refining data flows is not your grandpa´s functional decomposition. That was rooted in control flows. Refining data flows does not suffer from the limits of functional decomposition against which object orientation was supposed to be an antidote.

Summary

I´ve been working exclusively with data flows for functional design for the past 4 years. It has changed my life as a programmer. What once was difficult is now easy. And, no, I´m not using Clojure or F#. And I´m not a async/parallel execution buff.

Designing the functionality of increments using data flows works great with teams. It produces design documentation which can easily be translated into code - in which then the smallest data flow processing steps have to be fleshed out - which is comparatively easy.

Using a systematic translation approach code can mirror the data flow design. That way later on the design can easily be reproduced from the code if need be.

And finally, data flow designs play well with object orientation. They are a great starting point for class design. But that´s a story for another day.

To me data flow design simply is one of the missing links of systematic lightweight software design.


  1. There are also other artifacts software development can produce to get feedback, e.g. process descriptions, test cases. But customers can be delighted more easily with code based increments in functionality.

  2. No, I´m not talking about the endless possibilities this opens for parallel processing. Data flows are useful independently of multi-core processors and Actor-based designs. That´s my whole point here. Data flows are good for reasoning and evolvability. So forget about any special frameworks you might need to reap benefits from data flows. None are necessary. Translating data flow designs even into plain of Java is possible.

The Inkremental Architect´s Napkin - #4 - Make increments tangible

The driver of software development are increments, small increments, tiny increments. With an increment being a slice of the overall requirement scope thin enough to implement and get feedback from a product owner within 2 days max. Such an increment might concern Functionality or Quality.[1]

To make such high frequency delivery of increments possible, the transition from talking to coding needs to be as easy as possible. A user story or some other documentation of what´s supposed to get implemented until tomorrow evening at latest is one side of the medal. The other is where to put the logic in all of the code base.

To implement an increment, only logic statements are needed. Functionality like Quality are just about expressions and control flow statements. Think of Assembler code without the CALL/RET instructions. That´s all is needed. Forget about functions, forget about classes. To make a user happy none of that is really needed. It´s just about the right expressions and conditional executions paths plus some memory allocation. Automatic function inlining of compilers which makes it clear how unimportant functions are for delivering value to users at runtime.

But why then are there functions? Because they were invented for optimization purposes. We need them for better Evolvability and Production Efficiency. Nothing more, nothing less. No software has become faster, more secure, more scalable, more functional because we gathered logic under the roof of a function or two or a thousand.

Functions make logic easier to understand. Functions make us faster in producing logic. Functions make it easier to keep logic consistent. Functions help to conserve memory.

That said, functions are important. They are even the pivotal element of software development. We can´t code without them - whether you write a function yourself or not. Because there´s always at least one function in play: the Entry Point of a program.

In Ruby the simplest program looks like this:

puts "Hello, world!"

In C# more is necessary:

class Program {
    public static void Main () {
        System.Console.Write("Hello, world!");
    }
}

C# makes the Entry Point function explicit, not so Ruby. But still it´s there. So you can think of logic always running in some function.

Which brings me back to increments: In order to make the transition from talking to code as easy as possible, it has to be crystal clear into which function you should put the logic. Product owners might be content once there is a sticky note a user story on the Scrum or Kanban board. But developers need an idea of what that sticky note means in term of functions. Because with a function in hand, with a signature to run tests against, they have something to focus on.

All´s well once there is a function behind whose signature logic can be piled up. Then testing frameworks can be used to check if the logic is correct. Then practices like TDD can help to drive the implementation.

That´s why most code katas define exactly how the API of a solution should look like. It´s a function, maybe two or three, not more.

A requirement like “Write a function f which takes this as parameters and produces such and such output by doing x” makes a developer comfortable. Yes, there are all kinds of details to think about, like which algorithm or technology to use, or what kind of state and side effects to consider. Even a single function not only must deliver on Functionality, but also on Quality and Evolvability.

Nevertheless, once it´s clear which function to put logic in, you have a tangible starting point.

So, yes, what I´m suggesting is to find a single function to put all the logic in that´s necessary to deliver on a the requirements of an increment. Or to put it the other way around: Slice requirements in a way that each increment´s logic can be located under the roof of a single function.

Entry points

Of course, the logic of a software will always be spread across many, many functions. But there´s always an Entry Point. That´s the most important function for each increment, because that´s the root to put integration or even acceptance tests on.

A batch program like the above hello-world application only has a single Entry Point. All logic is reached from there, regardless how deep it´s nested in classes.

But a program with a user interface like this has at least two Entry Points:

image

One is the main function called upon startup. The other is the button click event handler for “Show my score”.

But maybe there are even more, like another Entry Point being a handler for the event fired when one of the choices gets selected; because then some logic could check if the button should be enabled because all questions got answered. Or another Entry Point for the logic to be executed when the program is close; because then the choices made should be persisted.

You see, an Entry Point to me is a function which gets triggered by the user of a software. With batch programs that´s the main function. With GUI programs on the desktop that´s event handlers. With web programs that´s handlers for URL routes.

And my basic suggestion to help you with slicing requirements for Spinning is: Slice them in a way so that each increment is related to only one Entry Point function.[2]

Entry Points are the “outer functions” of a program. That´s where the environment triggers behavior. That´s where hardware meets software. Entry points always get called because something happened to hardware state, e.g. a key was pressed, a mouse button clicked, the system timer ticked, data arrived over a wire.[3]

image

Viewed from the outside, software is just a collection of Entry Point functions made accessible via buttons to press, menu items to click, gestures, URLs to open, keys to enter.

image

Collections of batch processors

I´d thus say, we haven´t moved forward since the early days of software development. We´re still writing batch programs. Forget about “event-driven programming” with its fancy GUI applications. Software is just a collection of batch processors. Earlier it was just one per program, today it´s hundreds we bundle up into applications.

Each batch processor is represented by an Entry Point as its root that works on a number of resources from which it reads data to process and to which it writes results.

image

These resources can be the keyboard or main memory or a hard disk or a communication line or a display.

Together many batch processors - large and small - form applications the user perceives as a single whole:

image

Software development that way becomes quite simple: just implement one batch processor after another. Well, at least in principle ;-)

Features

Each batch processor entered through an Entry Point delivers value to the user. It´s an increment. Sometimes its logic is trivial, sometimes it´s very complex. Regardless, each Entry Point represents an increment. An Entry Point implemented thus is a step forward in terms of Agility.

At the same time it´s a tangible unit for developers. Therefore, identifying the more or less numerous batch processors in a software system is a rewarding task for product owners and developers alike. That´s where user stories meet code.

image

In this example the user story translates to the Entry Point triggered by clicking the login button on a dialog like this:

image

The batch then retrieves what has been entered via keyboard, loads data from a user store, and finally outputs some kind of response on the screen, e.g. by displaying an error message or showing the next dialog.

This is all very simple, but you see, there is not just one thing happening, but several.

  1. Get input (email address, password)
  2. Load user for email address
  3. If user not found report error
  4. Check password
  5. Hash password
  6. Compare hash to hash stored in user
  7. Show next dialog

Viewed from 10,000 feet it´s all done by the Entry Point function. And of course that´s technically possible. It´s just a bunch of logic and calling a couple of API functions.

However, I suggest to take these steps as distinct aspects of the overall requirement described by the user story. Such aspects of requirements I call Features.

Features too are increments. Each provides some (small) value of its own to the user. Each can be checked individually by a product owner.

Instead of implementing all the logic behind the Login() entry point at once you can move forward increment by increment, e.g.

  • First implement the dialog, let the user enter any credentials, and log him/her in without any checks. Features 1 and 4.
  • Then hard code a single user and check the email address. Features 2 and 2.1.
  • Then check password without hashing it (or use a very simple hash like the length of the password). Features 3. and 3.2
  • Replace hard coded user with a persistent user directoy, but a very simple one, e.g. a CSV file. Refinement of feature 2.
  • Calculate the real hash for the password. Feature 3.1.
  • Switch to the final user directory technology.

Each feature provides an opportunity to deliver results in a short amount of time and get feedback. If you´re in doubt whether you can implement the whole entry point function until tomorrow night, then just go for a couple of features or even just one.

That´s also why I think, you should strive for wrapping feature logic into a function of its own. It´s a matter of Evolvability and Production Efficiency. A function per feature makes the code more readable, since the language of requirements analysis and design is carried over into implementation. It makes it easier to apply changes to features because it´s clear where their logic is located. And finally, of course, it lets you re-use features in different context (read: increments).

Feature functions make it easier for you to think of features as Spinning increments, to implement them independently, to let the product owner check them for acceptance individually.

Increments consist of features, entry point functions consist of feature functions. So you can view software as a hierarchy of requirements from broad to thin which map to a hierarchy of functions - with entry points at the top.

image 

I like this image of software as a self-similar structure on many levels of abstraction where requirements and code match each other. That to me is true agile design: the core tenet of Agility to move forward in increments is carried over into implementation. Increments on paper are retained in code. This way developers can easily relate to product owners. Elusive and fuzzy requirements are not tangible.

Software production is moving forward through requirements one increment at a time, and one function at a time.

In closing

Product owners and developers are different - but they need to work together towards a shared goal: working software. So their notions of software need to be made compatible, they need to be connected.

The increments of the product owner - user stories and features - need to be mapped straightforwardly to something which is relevant to developers. To me that´s functions. Yes, functions, not classes nor components nor micro services.

We´re talking about behavior, actions, activities, processes. Their natural representation is a function. Something has to be done. Logic has to be executed. That´s the purpose of functions.

Later, classes and other containers are needed to stay on top of a growing amount of logic. But to connect developers and product owners functions are the appropriate glue. Functions which represent increments.


  1. Can there always be such a small increment be found to deliver until tomorrow evening? I boldly say yes. Yes, it´s always possible. But maybe you´ve to start thinking differently. Maybe the product owner needs to start thinking differently. Completion is not the goal anymore. Neither is checking the delivery of an increment through the user interface of a software. Product owners need to become comfortable using test beds for certain features. If it´s hard to slice requirements thin enough for Spinning the reason is too little knowledge of something. Maybe you don´t yet understand the problem domain well enough? Maybe you don´t yet feel comfortable with some tool or technology? Then it´s time to acknowledge this fact. Be honest about your not knowing. And instead of trying to deliver as a craftsman officially become a researcher. Research an check back with the product owner every day - until your understanding has grown to a level where you are able to define the next Spinning increment.

  2. Sometimes even thin requirement slices will cover several Entry Points, like “Add validation of email addresses to all relevant dialogs.” Validation then will it put into a dozen functons. Still, though, it´s important to determine which Entry Points exactly get affected. That´s much easier, if strive for keeping the number of Entry Points per increment to 1.

  3. If you like call Entry Point functions event handlers, because that´s what they are. They all handle events of some kind, whether that´s palpable in your code or note. A public void btnSave_Click(object sender, EventArgs e) {…} might look like an event handler to you, but public static void Main() {…} is one also - for then event “program started”.

The Incremental Architect´s Napkin – #3 – Make Evolvability inevitable

The easier something to measure the more likely it will be produced. Deviations between what is and what should be can be readily detected. That´s what automated acceptance tests are for. That´s what sprint reviews in Scrum are for.

It´s no small wonder our software looks like it looks. It has all the traits whose conformance with requirements can easily be measured. And it´s lacking traits which cannot easily be measured.

Evolvability (or Changeability) is such a trait. If an operation is correct, if an operation if fast enough, that can be checked very easily. But whether Evolvability is high or low, that cannot be checked by taking a measure or two.

Evolvability might correlate with certain traits, e.g. number of lines of code (LOC) per function or Cyclomatic Complexity or test coverage. But there is no threshold value signalling “evolvability too low”; also Evolvability is hardly tangible for the customer.

Nevertheless Evolvability is of great importance - at least in the long run. You can get away without much of it for a short time. Eventually, though, it´s needed like any other requirement. Or even more. Because without Evolvability no other requirement can be implemented. Evolvability is the foundation on which all else is build.

Such fundamental importance is in stark contrast with its immeasurability. To compensate this, Evolvability must be put at the very center of software development. It must become the hub around everything else revolves.

Since we cannot measure Evolvability, though, we cannot start watching it more. Instead we need to establish practices to keep it high (enough) at all times.

Chefs have known that for long. That´s why everybody in a restaurant kitchen is constantly seeing after cleanliness. Hygiene is important as is to have clean tools at standardized locations. Only then the health of the patrons can be guaranteed and production efficiency is constantly high.

Still a kitchen´s level of cleanliness is easier to measure than software Evolvability. That´s why important practices like reviews, pair programming, or TDD are not enough, I guess.

What we need to keep Evolvability in focus and high is… to continually evolve. Change must not be something to avoid but too embrace. To me that means the whole change cycle from requirement analysis to delivery needs to be gone through more often.

Scrum´s sprints of 4, 2 even 1 week are too long. Kanban´s flow of user stories across is too unreliable; it takes as long as it takes.

Instead we should fix the cycle time at 2 days max. I call that Spinning. No increment must take longer than from this morning until tomorrow evening to finish. Then it should be acceptance checked by the customer (or his/her representative, e.g. a Product Owner).

For me there are several resasons for such a fixed and short cycle time for each increment:

Clear expectations

Absolute estimates (“This will take X days to complete.”) are near impossible in software development as explained previously. Too much unplanned research and engineering work lurk in every feature. And then pervasive interruptions of work by peers and management.

However, the smaller the scope the better our absolute estimates become. That´s because we understand better what really are the requirements and what the solution should look like. But maybe more importantly the shorter the timespan the more we can control how we use our time.

So much can happen over the course of a week and longer timespans. But if push comes to shove I can block out all distractions and interruptions for a day or possibly two.

That´s why I believe we can give rough absolute estimates on 3 levels:

  • Noon
  • Tonight
  • Tomorrow

Think of a meeting with a Product Owner at 8:30 in the morning. If she asks you, how long it will take you to implement a user story or bug fix, you can say, “It´ll be fixed by noon.”, or you can say, “I can manage to implement it until tonight before I leave.”, or you can say, “You´ll get it by tomorrow night at latest.”

Yes, I believe all else would be naive. If you´re not confident to get something done by tomorrow night (some 34h from now) you just cannot reliably commit to any timeframe. That means you should not promise anything, you should not even start working on the issue.

So when estimating use these four categories: Noon, Tonight, Tomorrow, NoClue - with NoClue meaning the requirement needs to be broken down further so each aspect can be assigned to one of the first three categories.

If you like absolute estimates, here you go.

But don´t do deep estimates. Don´t estimate dozens of issues; don´t think ahead (“Issue A is a Tonight, then B will be a Tomorrow, after that it´s C as a Noon, finally D is a Tonight - that´s what I´ll do this week.”). Just estimate so Work-in-Progress (WIP) is 1 for everybody - plus a small number of buffer issues.

To be blunt: Yes, this makes promises impossible as to what a team will deliver in terms of scope at a certain date in the future.

But it will give a Product Owner a clear picture of what to pull for acceptance feedback tonight and tomorrow.

Trust through reliability

Our trade is lacking trust. Customers don´t trust software companies/departments much. Managers don´t trust developers much. I find that perfectly understandable in the light of what we´re trying to accomplish: delivering software in the face of uncertainty by means of material good production.

Customers as well as managers still expect software development to be close to production of houses or cars. But that´s a fundamental misunderstanding.

Software development ist development. It´s basically research. As software developers we´re constantly executing experiments to find out what really provides value to users. We don´t know what they need, we just have mediated hypothesises.

That´s why we cannot reliably deliver on preposterous demands. So trust is out of the window in no time.

If we switch to delivering in short cycles, though, we can regain trust. Because estimates - explicit or implicit - up to 32 hours at most can be satisfied.

I´d say: reliability over scope. It´s more important to reliably deliver what was promised then to cover a lot of requirement area. So when in doubt promise less - but deliver without delay.

Deliver on scope (Functionality and Quality); but also deliver on Evolvability, i.e. on inner quality according to accepted principles. Always.

Trust will be the reward. Less complexity of communication will follow. More goodwill buffer will follow.

So don´t wait for some Kanban board to show you, that flow can be improved by scheduling smaller stories. You don´t need to learn that the hard way. Just start with small batch sizes of three different sizes.

Fast feedback

What has been finished can be checked for acceptance. Why wait for a sprint of several weeks to end? Why let the mental model of the issue and its solution dissipate?

If you get final feedback after one or two weeks, you hardly remember what you did and why you did it. Resoning becomes hard. But more importantly youo probably are not in the mood anymore to go back to something you deemed done a long time ago. It´s boring, it´s frustrating to open up that mental box again.

Learning is harder the longer it takes from event to feedback. Effort can be wasted between event (finishing an issue) and feedback, because other work might go in the wrong direction based on false premises.

Checking finished issues for acceptance is the most important task of a Product Owner. It´s even more important than planning new issues. Because as long as work started is not released (accepted) it´s potential waste. So before starting new work better make sure work already done has value.

By putting the emphasis on acceptance rather than planning true pull is established. As long as planning and starting work is more important, it´s a push process.

Accept a Noon issue on the same day before leaving. Accept a Tonight issue before leaving today or first thing tomorrow morning. Accept a Tomorrow issue tomorrow night before leaving or early the day after tomorrow.

After acceptance the developer(s) can start working on the next issue.

Flexibility

As if reliability/trust and fast feedback for less waste weren´t enough economic incentive, there is flexibility.

After each issue the Product Owner can change course. If on Monday morning feature slices A, B, C, D, E were important and A, B, C were scheduled for acceptance by Monday evening and Tuesday evening, the Product Owner can change her mind at any time.

Maybe after A got accepted she asks for continuation with D. But maybe, just maybe, she has gotten a completely different idea by then. Maybe she wants work to continue on F. And after B it´s neither D nor E, but G. And after G it´s D.

With Spinning every 32 hours at latest priorities can be changed. And nothing is lost. Because what got accepted is of value. It provides an incremental value to the customer/user. Or it provides internal value to the Product Owner as increased knowledge/decreased uncertainty.

I find such reactivity over commitment economically very benefical. Why commit a team to some workload for several weeks? It´s unnecessary at beast, and inflexible and wasteful at worst.

If we cannot promise delivery of a certain scope on a certain date - which is what customers/management usually want -, we can at least provide them with unpredecented flexibility in the face of high uncertainty.

Where the path is not clear, cannot be clear, make small steps so you´re able to change your course at any time.

Premature completion

Customers/management are used to premeditating budgets. They want to know exactly how much to pay for a certain amount of requirements.

That´s understandable. But it does not match with the nature of software development. We should know that by now.

Maybe there´s somewhere in the world some team who can consistently deliver on scope, quality, and time, and budget. Great! Congratulations! I, however, haven´t seen such a team yet. Which does not mean it´s impossible, but I think it´s nothing I can recommend to strive for. Rather I´d say: Don´t try this at home. It might hurt you one way or the other.

However, what we can do, is allow customers/management stop work on features at any moment. With spinning every 32 hours a feature can be declared as finished - even though it might not be completed according to initial definition.

I think, progress over completion is an important offer software development can make. Why think in terms of completion beyond a promise for the next 32 hours?

Isn´t it more important to constantly move forward? Step by step. We´re not running sprints, we´re not running marathons, not even ultra-marathons. We´re in the sport of running forever. That makes it futile to stare at the finishing line. The very concept of a burn-down chart is misleading (in most cases).

Whoever can only think in terms of completed requirements shuts out the chance for saving money. The requirements for a features mostly are uncertain. So how does a Product Owner know in the first place, how much is needed. Maybe more than specified is needed - which gets uncovered step by step with each finished increment. Maybe less than specified is needed.

After each 4–32 hour increment the Product Owner can do an experient (or invite users to an experiment) if a particular trait of the software system is already good enough. And if so, she can switch the attention to a different aspect.

In the end, requirements A, B, C then could be finished just 70%, 80%, and 50%. What the heck? It´s good enough - for now. 33% money saved. Wouldn´t that be splendid? Isn´t that a stunning argument for any budget-sensitive customer? You can save money and still get what you need?

Pull on practices

So far, in addition to more trust, more flexibility, less money spent, Spinning led to “doing less” which also means less code which of course means higher Evolvability per se.

Last but not least, though, I think Spinning´s short acceptance cycles have one more effect. They excert pull-power on all sorts of practices known for increasing Evolvability.

If, for example, you believe high automated test coverage helps Evolvability by lowering the fear of inadverted damage to a code base, why isn´t 90% of the developer community practicing automated tests consistently?

I think, the answer is simple: Because they can do without. Somehow they manage to do enough manual checks before their rare releases/acceptance checks to ensure good enough correctness - at least in the short term.

The same goes for other practices like component orientation, continuous build/integration, code reviews etc. None of that is compelling, urgent, imperative. Something else always seems more important. So Evolvability principles and practices fall through the cracks most of the time - until a project hits a wall. Then everybody becomes desperate; but by then (re)gaining Evolvability has become as very, very difficult and tedious undertaking. Sometimes up to the point where the existence of a project/company is in danger.

With Spinning that´s different. If you´re practicing Spinning you cannot avoid all those practices. With Spinning you very quickly realize you cannot deliver reliably even on your 32 hour promises.

Spinning thus is pulling on developers to adopt principles and practices for Evolvability. They will start actively looking for ways to keep their delivery rate high. And if not, management will soon tell them to do that. Because first the Product Owner then management will notice an increasing difficulty to deliver value within 32 hours.

There, finally there emerges a way to measure Evolvability: The more frequent developers tell the Product Owner there is no way to deliver anything worth of feedback until tomorrow night, the poorer Evolvability is.

Don´t count the “WTF!”, count the “No way!” utterances.

In closing

For sustainable software development we need to put Evolvability first. Functionality and Quality must not rule software development but be implemented within a framework ensuring (enough) Evolvability.

Since Evolvability cannot be measured easily, I think we need to put software development “under pressure”. Software needs to be changed more often, in smaller increments. Each increment being relevant to the customer/user in some way.

That does not mean each increment is worthy of shipment. It´s sufficient to gain further insight from it. Increments primarily serve the reduction of uncertainty, not sales.

Sales even needs to be decoupled from this incremental progress. No more promises to sales. No more delivery au point. Rather sales should look at a stream of accepted increments (or incremental releases) and scoup from that whatever they find valuable. Sales and marketing need to realize they should work on what´s there, not what might be possible in the future. But I digress…

In my view a Spinning cycle - which is not easy to reach, which requires practice - is the core practice to compensate the immeasurability of Evolvability. From start to finish of each issue in 32 hours max - that´s the challenge we need to accept if we´re serious increasing Evolvability.

Fortunately higher Evolvability is not the only outcome of Spinning. Customer/management will like the increased flexibility and “getting more bang for the buck”.

The Incremental Architect´s Napkin - #2 - Balancing the forces

Categorizing requirements is the prerequisite for ecconomic architectural decisions. Not all requirements are created equal.

However, to truely understand and describe the requirement forces pulling on software development, I think further examination of the requirements aspects is varranted.

Aspects of Functionality

There are two sides to Functionality requirements.

image

It´s about what a software should do. I call that the Operations it implements. Operations are defined by expressions and control structures or calls to frameworks of some sort, i.e. (business) logic statements. Operations calculate, transform, aggregate, validate, send, receive, load, store etc. Operations are about behavior; they take input and produce output by considering state.

I´m not using the term “function” here, because functions - or methods or sub-programs - are not necessary to implement Operations. Functions belong to a different sub-aspect of requirements (see below).

Operations alone are not enough, though, to make a customer happy with regard to his/her Functionality requirements. Only correctly implemented Operations provide full value.

This should make clear, why testing is so important. And not just manual tests during development of some operational feature, but automated tests. Because only automated tests scale when over time the number of operations increases. Without automated tests there is no guarantee formerly correct operations are still correct after more got added. To retest all previous operations manually is infeasible.

So whoever relies just on manual tests is not really balancing the two forces Operations and Correctness. With manual tests more weight is put on the side of the scale of Operations. That might be ok for a short period of time - but in the long run it will bite you. You need to plan for Correctness in the long run from the first day of your project on.

Aspects of Quality

As important as Functionality is, it´s not the driver for software development. No software has ever been written to just implement some operation in code. We don´t need computers just to do something. All computers can do with software we can do without them. Well, at least given enough time and resources.

We could calculate the most complex formulas without computers. We could do auctions with millions of people without computers. The only reason we want computers to help us with this and a million other Operations is… We don´t want to wait for the results very long. Or we want less errors. Or we want easier accessability to complicated solutions.

So the main reason for customers to buy/order software is some Quality. They want some Functionality with a higher Quality (e.g. performance, scalability, usability, security…) than without the software.

But Qualities come in at least two flavors:

image

Most important are Primary Qualities. That´s the Qualities software truely is written for. Take an online auction website for example. Its Primary Qualities are performance, scalability, and usability, I´d say. Auctions should come within reach of millions of people; setting up an auction should be very easy; finding a suitable auction and bidding on it should be as fast as possible.

Only if those Qualities have been implemented does security become relevant. A secure auction website is important - but not as important as a fast auction website. Nobody would want to use the most secure auction website if it was unbearably slow. But there would be people willing to use the fastest auction website even it was lacking security.

That´s why security - with regard to online auction software - is not a Primary Quality, but just a Secondary Quality. It´s a supporting quality, so to speak. It does not deliver value by itself.

With a password manager software this might be different. There security might be a Primary Quality.

Please get me right: I don´t want to denigrate any Quality. There´s a long list of non-functional requirements at Wikipedia. They are all created equal - but that does not mean they are equally important for all software projects.

When confronted with Quality requirements check with the customer which are primary and which are secondary. That will help to make good economical decisions when in a crunch. Resources are always limited - but requirements are a bottomless ocean.

Aspects of Security of Investment

Functionality and Quality are traditionally the requirement aspects cared for most - by customers and developers alike. Even today, when pressure rises in a project, tunnel vision will focus on them. Any measures to create and hold up Security of Investment (SoI) will be out of the window pretty quickly.

Resistance to customers and/or management is futile. As long as SoI is not placed on equal footing with Functionality and Quality it´s bound to suffer under pressure.

To look closer at what SoI means will help to become more conscious about it and make customers and management aware of the risks of neglecting it.

SoI to me has two facets:

image

Production Efficiency (PE) is about speed of delivering value. Customers like short response times. Short response times mean less money spent. So whatever makes software development faster supports this requirement.

This must not lead to duct tape programming and banging out features by the dozen, though. Because customers don´t just want Operations and Quality, but also Correctness. So if Correctness gets compromised by focussing too much on Production Efficiency it will fire back.

Customers want PE not just today, but over the whole course of a software´s lifecycle. That means, it´s not just about coding speed, but equally about code quality. If code quality leads to rework the PE is on an unsatisfactory level.

Also if code production leads to waste it´s unsatisfactory. Because the effort which went into waste could have been used to produce value.

Rework and waste cost money. Rework and waste abound, however, as long as PE is not addressed explicitly with management and customers.

Thanks to the Agile and Lean movements that´s increasingly the case. Nevertheless more could and should be done in many teams. Each and every developer should keep in mind that Production Efficiency is as important to the customer as Functionality and Quality - whether he/she states it or not.

Making software development more efficient is important - but still sooner or later even agile projects are going to hit a glas ceiling. At least as long as they neglect the second SoI facet: Evolvability.

Delivering correct high quality functionality in short cycles today is good. But not just any software structure will allow this to happen for an indefinite amount of time.[1] The less explicitly software was designed the sooner it´s going to get stuck. Big ball of mud, monolith, brownfield, legacy code, technical debt… there are many names for software structures that have lost the ability to evolve, to be easily changed to accomodate new requirements.

An evolvable code base is the opposite of a brownfield. It´s code which can be easily understood (by developers with sufficient domain expertise) and then easily changed to accomodate new requirements. Ideally the costs of adding feature X to an evolvable code base is independent of when it is requested - or at least the costs should only increase linearly, not exponentially.[2]

Clean Code, Agile Architecture, and even traditional Software Engineering are concerned with Evolvability. However, it seems no systematic way of achieving it has been layed out yet. TDD + SOLID help - but still… When I look at the design ability reality in teams I see much room for improvement.

As stated previously, SoI - or to be more precise: Evolvability - can hardly be measured. Plus the customer rarely states an explicit expectation with regard to it. That´s why I think, special care must be taken to not neglect it. Postponing it to some large refactorings should not be an option. Rather Evolvability needs to be a core concern for every single developer day.

This should not mean Evolvability is more important than any of the other requirement aspects. But neither is it less important. That´s why more effort needs to be invested into it, to bring it on par with the other aspects, which usually are much more in focus.

In closing

As you see, requirements are of quite different kinds. To not take that into account will make it harder to understand the customer, and to make economic decisions.

Those sub-aspects of requirements are forces pulling in different directions. To improve performance might have an impact on Evolvability. To increase Production Efficiency might have an impact on security etc.

No requirement aspect should go unchecked when deciding how to allocate resources. Balancing should be explicit. And it should be possible to trace back each decision to a requirement.

Why is there a null-check on parameters at the start of the method? Why are there 5000 LOC in this method? Why are there interfaces on those classes? Why is this functionality running on the threadpool? Why is this function defined on that class? Why is this class depending on three other classes?

These and a thousand more questions are not to mean anything should be different in a code base. But it´s important to know the reason behind all of these decisions. Because not knowing the reason possibly means waste and having decided suboptimally.

And how do we ensure to balance all requirement aspects?

That needs practices and transparency.

Practices means doing things a certain way and not another, even though that might be possible. We´re dealing with dangerous tools here. Like a knife is a dangerous tool. Harm can be done if we use our tools in just any way at the whim of the moment.

Over the centuries rules and practices have been established how to use knifes. You don´t put them in peoples´ legs just because you´re feeling like it. You hand over a knife with the handle towards the receiver. You might not even be allowed to cut round food like potatos or eggs with it.

The same should be the case for dangerous tools like object-orientation, remote communication, threads etc. We need practices to use them in a way so requirements are balanced almost automatically.

In addition, to be able to work on software as a team we need transparency. We need means to share our thoughts, to work jointly on mental models. So far our tools are focused on working with code. Testing frameworks, build servers, DI containers, intellisense, refactoring support… That´s all nice and well. I don´t want to miss any of that. But I think it´s not enough. We´re missing mental tools, tools for making thinking and talking about software (independently of code) easier.

You might think, enough of such tools already exist like all those UML diagram types or Flow Charts. But then, isn´t it strange, hardly any team is using them to design software?

Or is that just due to a lack of education? I don´t think so. It´s a matter value/weight ratio: the current mental tools are too heavy weight compared to the value they deliver.

So my conclusion is, we need lightweight tools to really be able to balance requirements. Software development is complex. We need guidance not to forget important aspects. That´s like with flying an airplane. Pilots don´t just jump in and take off for their destination. Yes, there are times when they are “flying by the seats of their pants”, when they are just experts doing thing intuitively. But most of the time they are going through honed practices called checklist. See “The Checklist Manifesto” for very enlightening details on this.

Maybe then I should say it like this: We need more checklists for the complex businss of software development.[3]


  1. But that´s what software development mostly is about: changing software over an unknown period of time. It needs to be corrected in order to finally provide promised operations. It needs to be enhanced to provide ever more operations and qualities. All this without knowing when it´s going to stop. Probably never - until “maintainability” hits a wall when the technical debt is too large, the brownfield too deep. Software development is not a sprint, is not a marathon, not even an ultra marathon. Because to all this there is a foreseeable end. Software development is like continuously and foreever running…

  2. And sometimes I dare to think that costs could even decrease over time. Think of it: With each feature a software becomes richer in functionality. So with each additional feature the chance of there being already functionality helping its implementation increases. That should lead to less costs of feature X if it´s requested later than sooner. X requested later could stand on the shoulders of previous features. Alas, reality seems to be far from this despite 20+ years of admonishing developers to think in terms of reusability.[1]

  3. Please don´t get me wrong: I don´t want to bog down the “art” of software development with heavyweight practices and heaps of rules to follow. The framework we need should be lightweight. It should not stand in the way of delivering value to the customer. It´s purpose is even to make that easier by helping us to focus and decreasing waste and rework.

Informed TDD – Kata “To Roman Numerals”

In a comment on my article on what I call Informed TDD (ITDD) reader gustav asked how this approach would apply to the kata “To Roman Numerals”. And whether ITDD wasn´t a violation of TDD´s principle of leaving out “advanced topics like mocks”.

I like to respond with this article to his questions. There´s more to say than fits into a commentary.

Mocks and TDD

I don´t see in how far TDD is avoiding or opposed to mocks. TDD and mocks are orthogonal. TDD is about pocess, mocks are about structure and costs. Maybe by moving forward in tiny red+green+refactor steps less need arises for mocks. But then… if the functionality you need to implement requires “expensive” resource access you can´t avoid using mocks. Because you don´t want to constantly run all your tests against the real resource.

True, in ITDD mocks seem to be in almost inflationary use. That´s not what you usually see in TDD demonstrations. However, there´s a reason for that as I tried to explain. I don´t use mocks as proxies for “expensive” resource. Rather they are stand-ins for functionality not yet implemented. They allow me to get a test green on a high level of abstraction. That way I can move forward in a top-down fashion.

But if you think of mocks as “advanced” or if you don´t want to use a tool like JustMock, then you don´t need to use mocks. You just need to stand the sight of red tests for a little longer ;-) Let me show you what I mean by that by doing a kata.

ITDD for “To Roman Numerals”

gustav asked for the kata “To Roman Numerals”. I won´t explain the requirements again. You can find descriptions and TDD demonstrations all over the internet, like this one from Corey Haines.

Now here is, how I would do this kata differently.

1. Analyse

A demonstration of TDD should never skip the analysis phase. It should be made explicit. The requirements should be formalized and acceptance test cases should be compiled.

“Formalization” in this case to me means describing the API of the required functionality. “[D]esign a program to work with Roman numerals” like written in this “requirement document” is not enough to start software development. Coding should only begin, if the interface between the “system under development” and its context is clear.

If this interface is not readily recognizable from the requirements, it has to be developed first. Exploration of interface alternatives might be in order. It might be necessary to show several interface mock-ups to the customer – even if that´s you fellow developer.

Designing the interface is a task of it´s own. It should not be mixed with implementing the required functionality behind the interface. Unfortunately, though, this happens quite often in TDD demonstrations. TDD is used to explore the API and implement it at the same time. To me that´s a violation of the Single Responsibility Principle (SRP) which not only should hold for software functional units but also for tasks or activities.

In the case of this kata the API fortunately is obvious. Just one function is needed: string ToRoman(int arabic). And it lives in a class ArabicRomanConversions.

Now what about acceptance test cases? There are hardly any stated in the kata descriptions. Roman numerals are explained, but no specific test cases from the point of view of a customer. So I just “invent” some acceptance test cases by picking roman numerals from a wikipedia article. They are supposed to be just “typical examples” without special meaning.

Given the acceptance test cases I then try to develop an understanding of the problem domain. I´ll spare you that. The domain is trivial and is explain in almost all kata descriptions. How roman numerals are built is not difficult to understand. What´s more difficult, though, might be to find an efficient solution to convert into them automatically.

2. Solve

The usual TDD demonstration skips a solution finding phase. Like the interface exploration it´s mixed in with the implementation. But I don´t think this is how it should be done. I even think this is not how it really works for the people demonstrating TDD. They´re simplifying their true software development process because they want to show a streamlined TDD process. I doubt this is helping anybody.

Before you code you better have a plan what to code. This does not mean you have to do “Big Design Up-Front”. It just means: Have a clear picture of the logical solution in your head before you start to build a physical solution (code). Evidently such a solution can only be as good as your understanding of the problem. If that´s limited your solution will be limited, too.

Fortunately, in the case of this kata your understanding does not need to be limited. Thus the logical solution does not need to be limited or preliminary or tentative. That does not mean you need to know every line of code in advance. It just means you know the rough structure of your implementation beforehand. Because it should mirror the process described by the logical or conceptual solution.

Here´s my solution approach:

The arabic “encoding” of numbers represents them as an ordered set of powers of 10. Each digit is a factor to multiply a power of ten with. The “encoding” 123 is the short form for a set like this: {1*10^2, 2*10^1, 3*10^0}. And the number is the sum of the set members.

The roman “encoding” is different. There is no base (like 10 for arabic numbers), there are just digits of different value, and they have to be written in descending order. The “encoding” XVI is short for [10, 5, 1]. And the number is still the sum of the members of this list.

The roman “encoding” thus is simpler than the arabic. Each “digit” can be taken at face value. No multiplication with a base required. But what about IV which looks like a contradiction to the above rule? It is not – if you accept roman “digits” not to be limited to be single characters only. Usually I, V, X, L, C, D, M are viewed as “digits”, and IV, IX etc. are viewed as nuisances preventing a simple solution.

All looks different, though, once IV, IX etc. are taken as “digits”. Then MCMLIV is just a sum: M+CM+L+IV which is 1000+900+50+4. Whereas before it would have been understood as M-C+M+L-I+V – which is more difficult because here some “digits” get subtracted. Here´s the list of roman “digits” with their values:

{1, I}, {4, IV}, {5, V}, {9, IX}, {10, X}, {40, XL}, {50, L}, {90, XC}, {100, C}, {400, CD}, {500, D}, {900, CM}, {1000, M}

Since I take IV, IX etc. as “digits” translating an arabic number becomes trivial. I just need to find the values of the roman “digits” making up the number, e.g. 1954 is made up of 1000, 900, 50, and 4. I call those “digits” factors.

If I move from the highest factor (M=1000) to the lowest (I=1) then translation is a two phase process:

  1. Find all the factors
  2. Translate the factors found
  3. Compile the roman representation

Translation is just a look-up. Finding, though, needs some calculation:

  1. Find the highest remaining factor fitting in the value
  2. Remember and subtract it from the value
  3. Repeat with remaining value and remaining factors

Please note: This is just an algorithm. It´s not code, even though it might be close. Being so close to code in my solution approach is due to the triviality of the problem. In more realistic examples the conceptual solution would be on a higher level of abstraction.

With this solution in hand I finally can do what TDD advocates: find and prioritize test cases.

As I can see from the small process description above, there are two aspects to test:

  • Test the translation
  • Test the compilation
  • Test finding the factors

Testing the translation primarily means to check if the map of factors and digits is comprehensive. That´s simple, even though it might be tedious.

Testing the compilation is trivial.

Testing factor finding, though, is a tad more complicated. I can think of several steps:

  1. First check, if an arabic number equal to a factor is processed correctly (e.g. 1000=M).
  2. Then check if an arabic number consisting of two consecutive factors (e.g. 1900=[M,CM]) is processed correctly.
  3. Then check, if a number consisting of the same factor twice is processed correctly (e.g. 2000=[M,M]).
  4. Finally check, if an arabic number consisting of non-consecutive factors (e.g. 1400=[M,CD]) is processed correctly.

I feel I can start an implementation now. If something becomes more complicated than expected I can slow down and repeat this process.

3. Implement

First I write a test for the acceptance test cases. It´s red because there´s no implementation even of the API. That´s in conformance with “TDD lore”, I´d say:

image

Next I implement the API:

image

The acceptance test now is formally correct, but still red of course. This will not change even now that I zoom in. Because my goal is not to most quickly satisfy these tests, but to implement my solution in a stepwise manner. That I do by “faking” it: I just “assume” three functions to represent the transformation process of my solution:

image

My hypothesis is that those three functions in conjunction produce correct results on the API-level. I just have to implement them correctly. That´s what I´m trying now – one by one.

I start with a simple “detail function”: Translate(). And I start with all the test cases in the obvious equivalence partition:

image

As you can see I dare to test a private method. Yes. That´s a white box test. But as you´ll see it won´t make my tests brittle. It serves a purpose right here and now: it lets me focus on getting one aspect of my solution right.

Here´s the implementation to satisfy the test:

image

It´s as simple as possible. Right how TDD wants me to do it: KISS.

Now for the second equivalence partition: translating multiple factors. (It´a pattern: if you need to do something repeatedly separate the tests for doing it once and doing it multiple times.)

image

In this partition I just need a single test case, I guess. Stepping up from a single translation to multiple translations is no rocket science:

image

Usually I would have implemented the final code right away. Splitting it in two steps is just for “educational purposes” here. How small your implementation steps are is a matter of your programming competency. Some “see” the final code right away before their mental eye – others need to work their way towards it.

Having two tests I find more important.

Now for the next low hanging fruit: compilation. It´s even simpler than translation.

image

A single test is enough, I guess. And normally I would not even have bothered to write that one, because the implementation is so simple. I don´t need to test .NET framework functionality. But again: if it serves the educational purpose…

image

Finally the most complicated part of the solution: finding the factors. There are several equivalence partitions. But still I decide to write just a single test, since the structure of the test data is the same for all partitions:

image

Again, I´m faking the implementation first:

image

I focus on just the first test case. No looping yet.

Faking lets me stay on a high level of abstraction. I can write down the implementation of the solution without bothering myself with details of how to actually accomplish the feat.

That´s left for a drill down with a test of the fake function:

image

There are two main equivalence partitions, I guess: either the first factor is appropriate or some next.

The implementation seems easy. Both test cases are green. (Of course this only works on the premise that there´s always a matching factor. Which is the case since the smallest factor is 1.)

image

And the first of the equivalence partitions on the higher level also is satisfied:

image

Great, I can move on. Now for more than a single factor:

image

Interestingly not just one test becomes green now, but all of them. Great!

image

You might say, then I must have done not the simplest thing possible. And I would reply: I don´t care. I did the most obvious thing. But I also find this loop very simple. Even simpler than a recursion of which I had thought briefly during the problem solving phase.

And by the way: Also the acceptance tests went green:

image

Mission accomplished. At least functionality wise.

Now I´ve to tidy up things a bit. TDD calls for refactoring. Not uch refactoring is needed, because I wrote the code in top-down fashion. I faked it until I made it. I endured red tests on higher levels while lower levels weren´t perfected yet. But this way I saved myself from refactoring tediousness.

At the end, though, some refactoring is required. But maybe in a different way than you would expect. That´s why I rather call it “cleanup”.

First I remove duplication. There are two places where factors are defined: in Translate() and in Find_factors(). So I factor the map out into a class constant.

image

Which leads to a small conversion in Find_factors():

image

And now for the big cleanup: I remove all tests of private methods. They are scaffolding tests to me. They only have temporary value. They are brittle. Only acceptance tests need to remain.

However, I carry over the single “digit” tests from Translate() to the acceptance test. I find them valuable to keep, since the other acceptance tests only exercise a subset of all roman “digits”.

This then is my final test class:

image

And this is the final production code:

image

Test coverage as reported by NCrunch is 100%:

image

Reflexion

Is this the smallest possible code base for this kata? Sure not. You´ll find more concise solutions on the internet.

But LOC are of relatively little concern – as long as I can understand the code quickly. So called “elegant” code, however, often is not easy to understand. The same goes for KISS code – especially if left unrefactored, as it is often the case.

That´s why I progressed from requirements to final code the way I did. I first understood and solved the problem on a conceptual level. Then I implemented it top down according to my design.

I also could have implemented it bottom-up, since I knew some bottom of the solution. That´s the leaves of the functional decomposition tree.

Where things became fuzzy, since the design did not cover any more details as with Find_factors(), I repeated the process in the small, so to speak: fake some top level, endure red high level tests, while first solving a simpler problem.

Using scaffolding tests (to be thrown away at the end) brought two advantages:

  • Encapsulation of the implementation details was not compromised. Naturally private methods could stay private. I did not need to make them internal or public just to be able to test them.
  • I was able to write focused tests for small aspects of the solution. No need to test everything through the solution root, the API.

The bottom line thus for me is: Informed TDD produces cleaner code in a systematic way. It conforms to core principles of programming: Single Responsibility Principle and/or Separation of Concerns. Distinct roles in development – being a researcher, being an engineer, being a craftsman – are represented as different phases. First find what, what there is. Then devise a solution. Then code the solution, manifest the solution in code.

Writing tests first is a good practice. But it should not be taken dogmatic. And above all it should not be overloaded with purposes.

And finally: moving from top to bottom through a design produces refactored code right away. Clean code thus almost is inevitable – and not left to a refactoring step at the end which is skipped often for different reasons.

 

PS: Yes, I have done this kata several times. But that has only an impact on the time needed for phases 1 and 2. I won´t skip them because of that. And there are no shortcuts during implementation because of that.

The Incremental Architect´s Napkin - #1 - It´s about the money, stupid

Software development is an economic endeavor. A customer is only willing to pay for value. What makes a software valuable is required to become a trait of the software. We as software developers thus need to understand and then find a way to implement requirements.

Whether or in how far a customer really can know beforehand what´s going to be valuable for him/her in the end is a topic of constant debate. Some aspects of the requirements might be less foggy than others. Sometimes the customer does not know what he/she wants. Sometimes he/she´s certain to want something - but then is not happy when that´s delivered.

Nevertheless requirements exist. And developers will only be paid if they deliver value. So we better focus on doing that.

Although is might sound trivial I think it´s important to state the corollary: We need to be able to trace anything we do as developers back to some requirement.

You decide to use Go as the implementation language? Well, what´s the customer´s requirement this decision is linked to? You decide to use WPF as the GUI technology? What´s the customer´s requirement? You decide in favor of a layered architecture? What´s the customer´s requirement? You decide to put code in three classes instead of just one? What´s the customer´s requirement behind that? You decide to use MongoDB over MySql? What´s the customer´s requirement behind that? etc.

I´m not saying any of these decisions are wrong. I´m just saying whatever you decide be clear about the requirement that´s driving your decision. You have to be able to answer the question: Why do you think will X deliver more value to the customer than the alternatives?

Customers are not interested in romantic ideals of hard working, good willing, quality focused craftsmen. They don´t care how and why you work - as long as what you deliver fulfills their needs. They want to trust you to recognize this as your top priority - and then deliver. That´s all.

Fundamental aspects of requirements

If you´re like me you´re probably not used to such scrutinization. You want to be trusted as a professional developer - and decide quite a few things following your gut feeling. Or by relying on “established practices”.

That´s ok in general and most of the time - but still… I think we should be more conscious about our decisions. Which would make us more responsible, even more professional.

But without further guidance it´s hard to reason about many of the myriad decisions we´ve to make over the course of a software project.

What I found helpful in this situation is structuring requirements into fundamental aspects. Instead of one large heap of requirements then there are smaller blobs. With them it´s easier to check if a decisions falls in their scope.

image

Sure, every project has it´s very own requirements. But all of them belong to just three different major categories, I think. Any requirement either pertains to functionality, non-functional aspects or sustainability.

image

For short I call those aspects:

  • Functionality, because such requirements describe which transformations a software should offer. For example: A calculator software should be able to add and multiply real numbers. An auction website should enable you to set up an auction anytime or to find auctions to bid for.
  • Quality, because such requirements describe how functionality is supposed to work, e.g. fast or secure. For example: A calculator should be able to calculate the sinus of a value much faster than you could in your head. An auction website should accept bids from millions of users.
  • Security of Investment, because functionality and quality need not just be delivered in any way. It´s important to the customer to get them quickly - and not only today but over the course of several years. This aspect introduces time into the “requrements equation”.

Security of Investments (SoI) sure is a non-functional requirement. But I think it´s important to not subsume it under the Quality (Q) aspect. That´s because SoI has quite special properties.

For one, SoI for software means something completely different from what it means for hardware. If you buy hardware (a car, a hair blower) you find that a worthwhile investment, if the hardware does not change it´s functionality or quality over time. A car still running smoothly with hardly any rust spots after 10 years of daily usage would be a very secure investment. So for hardware (or material products, if you like) “unchangeability” (in the face of usage) is desirable.

With software you want the contrary. Software that cannot be changed is a waste. SoI for software means “changeability”. You want to be sure that the software you buy/order today can be changed, adapted, improved over an unforseeable number of years so as fit changes in its usage environment.

But that´s not the only reason why the SoI aspect is special. On top of changeability[1] (or evolvability) comes immeasurability. Evolvability cannot readily be measured by counting something. Whether the changeability is as high as the customer wants it, cannot be determined by looking at metrics like Lines of Code or Cyclomatic Complexity or Afferent Coupling. They may give a hint… but they are far, far from precise.

That´s because of the nature of changeability. It´s different from performance or scalability. Also it´s because a customer cannot tell upfront, “how much” evolvability he/she wants.

Whether requirements regarding Functionality (F) and Q have been met, a customer can tell you very quickly and very precisely. A calculation is missing, the calculation takes too long, the calculation time degrades with increased load, the calculation is accessible to the wrong users etc. That´s all very or at least comparatively easy to determine.

But changeability… That´s a whole different thing. Nevertheless over time the customer will develop a feedling if changeability is good enough or degrading. He/she just has to check the development of the frequency of “WTF”s from developers ;-)

F and Q are “timeless” requirement categories. Customers want us to deliver on them now. Just focusing on the now, though, is rarely beneficial in the long run. So SoI adds a counterweight to the requirements picture. Customers want SoI - whether they know it or not, whether they state if explicitly or not.

In closing

A customer´s requirements are not monolithic. They are not all made the same. Rather they fall into different categories. We as developers need to recognize these categories when confronted with some requirement - and take them into account. Only then can we make true professional decisions, i.e. conscious and responsible ones.


  1. I call this fundamental trait of software “changeability” and not “flexibility” to distinguish to whom it´s a concern. “Flexibility” to me means, software as is can easily be adapted to a change in its environment, e.g. by tweaking some config data or adding a library which gets picked up by a plug-in engine. “Flexibiltiy” thus is a matter of some user. “Changeability”, on the other hand, to me means, software can easily be changed in its structure to adapt it to new requirements. That´s a matter of the software developer.