In need of more abstraction

The ultimate product of software development is this: CPU executable binary code.


Decades ago we used to “write” this more or less directly into memory. But that was very tedious and error prone. Code was hard to reason about, hard to change.

Abstractions in code

So we looked for ways to make coding easier. Enter a higher level of abstraction: Assembler.


By representing machine code instructions as text and throwing in macros productivity increased. It was easier to read programs, easier to think them up in the first place, and they were quicker to write with less errors.

According to Jack W. Reeves Assembler source code was a design of the ultimate code which got built by an automatic transformation step.

Soon, though, software was deemed hard to write even with Assembler. So many details needed to be taken care of again and again, why not hide that gain behind some abstractions?

That was when 3GL languages were invented like Fortran or Algol, later C and Pascal etc. But I want to paint this evolution differently. Because from today’s point of view the next level of abstraction on top of Assembler is not a 3GL like Java or C#, but an intermediate language like Java Byte Code (JBC) or the .NET Intermediate Language (IL).[1]


Solving the problem of overwhelming details of concrete hardware machines was accomplished by putting a software machine on top of it, a virtual machine (VM) with its own machine code and Assembler language.

Where Assembler provided symbolic instructions and names and macros to abstract from bits and bytes, VMs for example provided easier memory management. Not having to deal with CPU registers or memory management anymore made programming a lot easier.

Now IL Assembler source code was a design for IL byte code, which was source code for machine code Assembler, which was source code for the ultimate machine code. Ok, not really, but in principle.

IL made things simpler - but not simple enough. Programming still left much to be desired in terms of readable source code and productivity. Partly the solution to that were libraries. But also another level of abstraction was needed. Enter 3GLs with all there control structures and syntactic sugar for memory management and sub-program access.


That’s where we are today. Source code written in Java or C# is the design for some IL Assembler, which is the design for IL VM byte code, which is the design for machine code Assembler, which is the design for the ultimate machine code. OK, not really, but in principle.[2]

Abstraction beyond code

We like to think of 3GL source code as the design for the executable machine code. As it turned out, though, yesterday’s design, yesterday’s source code became today’s target code.

Yesterday’s abstractions became today’s details. Nobody wants to reason about software on the level of abstraction of any Assembler language. That’s why Flow Charts and Nassi-Shneiderman diagrams were invented.

And what was pseudo-code once, is now a real programming language.

Taking this evolution as a whole into view it begs the question: What’s next?

There is a pattern so far. As many levels of abstractions as have been put onto each other there is one aspect that hasn’t changed. All those languages - Assembler, IL, 3GL - are all about control flow.

Mainstream reasoning about software hasn’t changed. Today as in the 1950s it’s about algorithms. It’s about putting together logic statements to create behavior.

So how can this be extended? What’s our current “pseudo-code” about to be turned into source code of some future IDE?

My impression is: It’s over.

Control flow thinking, the imperative style of programming is at its limit.

There won’t be another level of abstraction in the same vain. I mean language-wise. The number of frameworks to be glued together to form applications will increase. There will be more levels of abstractions.

But to actually design behavior, we will need to switch to another paradigm.

Accessing data has become hugely more productive by the introduction of declarative programming languages like SQL (and modern derivatives like Linq) or Regular Expressions.

So my guess is, we need to go more in that direction. Programming has to become more declarative. We have to stave off imperative details as long as possible.

Functional Programming (FP) seems to be hinting in that direction. Recursion is a declarative solution compared to loops. Also simple data flows as f |> g in F# have declarative power because they leave open whether control flows along with data. f could (in theory) still be active while g already works on some output from f.

Still, though, even with FP there is one question unanswered: How do you think about code?

Is there a way for us to express solutions without encoding them as heaps of texts right away? Is there a way to communicate solutions without and before actually programming them? Can we describe software behavior in a systematic way on a next level of abstraction - and then systematically translate this description into Groovy or Haskell?

Object-orientation (OO) has given us more ways to describe data structures than most developers know. Think of all the relationship types defined in UML.

But software firstly is not about data structures, it’s about functionality, about behavior, about activities. How can that be described, planned, designed above today’s source code, even FP source code?

Because if Assembler, the code design of the 1950s, nowadays is just the output of a compiler translating today’s 3GL source code design… then what kind of design can be translated into today’s source code as a target?

Model-Driven Software Development (MDSD) seems to be trying to answer this question. But despite all efforts it has not been widely adopted. My guess is, that’s because the design of a modelling language is even harder than the design of a decent framework. Not many developers can do that. Also, not many domains lend themselves to this. And it’s not worth the effort in many cases.

But still, MDSD has gotten something right, I guess. Because what I’ve seen of it so far mostly is about declarative languages.

So the question seems to be: What’s a general purpose way to describe software behavior in a declarative manner?

Only by answering this question we’ll be able to enter a next level of abstraction in programming - even if that currently only means to enable more systematic designs before 3GL code and without automatic translation.

We have done that before. That’s how we started with object-orientation or querying data. First there was a model, a way of thinking, the abstraction. Then, later, there was a tool to translate abstract descriptions (designs) into machine code.

The above images all show the same code.[3] The same solution on different levels of abstraction.

However, can you imagine the solution on yet another level of abstraction above the 3GL/C# source code?

That’s what I’m talking about. Programming should not begin with source code. It should begin with thinking. Thinking in terms of models, i.e. even more abstract descriptions of solutions than source code.

As long as we’re lacking a systematic way of designing behavior before 3GL source code - be it OO or FP - we’ll be suffering from limited productivity. Like programmers suffered from limited productivity in the 1950s or 1990s before the invention of Assembler, IL, 3GLs.

And what’s the next level of abstraction?

In my view it’s data flow orientation. We have to say goodbye to control flow and embrace data flows. Control flow will always have its place. But it’s for fleshing out details of behavior. The big picture of software behavior has to be painted in a declarative manner.

Switching from OO languages to FP languages won’t help, though. Both are limited by textual representation. They are great means to encode data flows. But they are cumbersome to think in. Nobody wants to design software in machine code or byte code. Nobody wants to even do it in Assembler. And why stop with 3GLs?

No, think visually. Think in two or three or even more dimensions.

And once we’ve designed a solution in that “space”, we can translate it into lesser textual abstractions - which then will look differently.

This solution


surely wasn’t translated from a design on a higher level of abstraction. How the problem “wrap long lines” is approached conceptually is not readily understandable. Even if there were automatic tests to be seen they would not explain the solution. Tests just check for certain behavior.

So, as an exercise, can you imagine a solution to the problem “Word Wrap Kata” on a higher level of abstraction? Can you depict how the expected behavior could be produced? In a declarative manner?

That’s what I mean. To that level of discussion about software we have to collectively rise.


PS: Ok, even though I did not want to elaborate on how I think designing with data flows can work – you find more information for example in my blog series on “OOP as if you meant it” –, I guess I should at least give you a glimpse of it.

So this is a flow design for the above word wrapping problem:



This shows in a declarative manner, how I envision a process for “producing” the desired behavior. The top level/root of the hierarchical flow represents the function in question. The lower level depicts the “production process” to transform a text:

  • First split the text into words,
  • then split words longer than the max line length up into “syllables” (slices).
  • Slices then are put together to form the new lines of the given max length.
  • Finally all those lines are combined into the new text.

This sounds like control flow – but that’s only due to the simplicity of the problem. With slight changes the flow design could be made async, though. Then control would not flow along with the data anymore.

The data flow tells a story of what needs to be done, not how it exactly should happen. Refinement of a flow design stops when each leaf node seems to be easy enough to be written down in imperative source code.

Here’s a translation of the flow design into C# source code:



You see, the design is retained. The solution idea is clearly visible in code. The purpose of Wrap() is truly single: it just integrates functions into a flow. The solution can be read from top to bottom like the above bullet point list. The code is “visually honest”.

Such a flow design can be easily drawn on a flipchart. With it I can communicate my idea of how to create a certain behavior quickly to my fellow team members. It’s easy to translate into code. And since it does not contain imperative logic, it leads to very clean code, too. Logic is confined to functionally independent leafs in the decomposition tree of the flow design. Read more about this in my blog series “The Incremental Architect’s Napkin”.

  1. I even remember P-Code already used in the 1970s as the target of Pascal compilers.

  2. Of course, this is not what happens with all 3GL languages. Some are interpreted, some are compiled to real machine code. Still, this is the level of abstraction we’ve reached in general.

  3. Well, to be honest, the first image is just some arbitrary binary code. I couldn’t figure out how to get it for the Assembler code in the second image.

The Steep Curve of Feature Prioritization

How to prioritize features for development? Which to do first, which then, which last? That’s a much debated aspect of software development. And most teams I know are not very systematic about it.

That’s a pity, because doing features in the “wrong” order means creating value for the customer slower than possible. Or it even means producing waste.

In his recent book “The Nature of Software Development” Ron Jeffries showed how he thinks prioritization should be done with nice drawings like this:


Each rectangle represents a feature with a certain value to be produced with a certain effort (time, money).


The higher the “feature box”, the more value is produced. The longer the box, the more it costs to produce the value.

As Ron Jeffries’ drawing clearly shows, there are several ways how to order (prioritize) features. In the end, the same value is produced over the same time. But how fast how much value is added may differ greatly.

Prioritize by Value

His suggestion is: Implement high value features first, postpone low value features as long as possible. This makes for a steeper value growth curve. In the above drawing the top order of features is to be preferred. It grows value faster than the bottom order.

I agree.

But is it really that easy? Just order features according to value and effort? Take this assortment of features for example:


Features s..x are of the same value - but require increasing effort. Features d..a are of decreasing value - but require the same effort. The order in which to build thus is in decreasing value and growing effort.


I think, this is what Ron Jeffries had in mind:

Look at the difference in the growth of value if we choose the higher value inexpensive features first and defer lower value costly features until later.

But, alas, features usually don’t come in such nicely cut variations. The range of values and efforts is larger. Take the following features for example:


Building them in this order leads to this feature value growth curve:


Still not bad, isn’t it. A concave curve showing how value is produced quickly right from the start.

Prioritize by Weight

However… We can do better. And that’s what’s missing from Ron Jeffries’ book. Ordering features by value and effort easily leads to suboptimal curvature.

Let me show you what I mean. Here’s an alternative ordering for the same features:


Value growth looks steeper, doesn’t it?

How is this? What’s the criteria for ordering the features? It’s obviously not just value, because for example feature c is done before x which provides higher value.

The criteria is called weight. It’s calculated as described by the Weighted Shortest Job First (WSJF) method. It takes into account not only value, but also effort.

weight = value / effort

The higher the weight, the higher the feature should be prioritized.

In a diagram weight can be found in the angle of the diagonal of a feature box. Or to be precise: Since we don’t know the angle, it is the tangent of the angle.


The larger the angle, the steeper the diagonal points upwards, the higher the weight, the earlier the feature should be implemented.

What this means for prioritizing the above features shows the following figure where the features are ordered according to their weight:


You see: the angle progressively becomes smaller, the inclination of the diagonals decreases. And that means, the value growth curve is steeper, when you implement the features in this order.

Compare the value-focused prioritization drawn with triangles


to the WSJF prioritization:


The growth of value is steeper and smoother with the WSJF prioritization, there are no slumps in the curve.

Of course, value-focused prioritization and WSJF prioritization result in the same order for features of same effort. So the question is: Can you slice down requirements to be of the same effort - and still “calculate” a meaningful value for each of them?

I’d say, that’s possible - but only pretty late in the game. You already have to be very clear about a lump of requirements to break it down into (roughly) equally sized increments.

That means, mostly you’ll need to prioritize the hard way and do it in WSJF manner.

Finding Value

However, regardless how you prioritize, you need to find the value of your features. How to do that?

In my experience it’s rarely possible to find a monetary value. For large features (whole applications or modules) this might work, but not for smaller features. How many more customers will buy your software if you fix this bug or add the PDF export? How many customers will cancel the subscription to your online service, if you don’t fix the bug or improve usability?

“Cost of delay”, i.e. how much money you’ll loose/not earn as long as a feature is missing, is very hard to determine. I can’t think of any client of mine who would know (on a regular basis).

So what’s the alternative? Choose whatever fits the bill. If you’ve several thousand users you can speculate about how many of them will benefit from a feature. Or even with a few users you can ask yourself, how often they would use that feature on average (each day or each year). Or you can think about how crucial a feature is for their business. Or you can check for dependencies between features; a feature other features depend on might be worth more, because its kind of enabling. Or maybe lacking a feature poses a tangible risk, because the software would loose an important certification.

Find any number of criteria applicable to your software. And assign values to them. You can go with scales from 1 to 5 or the Fibonacci numbers. It’s less about precision as it is about comparability.

Maybe you find just three criteria. That’s better than just a gut feeling. If you range each from 0 to 5 the minimum value for a feature is 0 and the maximum value is 15.

Even with these few criteria talking about feature value becomes more differentiated and objective. Less guessing, more thinking.

Determining Effort

The same is true for determining the effort needed for a feature. We’re talking about estimation here. A sensitive topic for software developers and managers alike.

Calculating the time needed to do a feature with high reliability is near impossible. Ron Jeffries makes that very clear in his book and I can only agree.

So we should not try to predict absolute effort. Fortunately for prioritization it’s sufficient to just determine efforts for comparison. This feature will take x amount of time, that feature will take two times x, and another will take just half x.

Again a simple scale will do. Or go with the Fibonacci numbers again. Yes, it’s almost like with Story Points in Scrum. But don’t fall into the prediction trap! Don’t try to forecast how many features you’ll be able to deliver in a certain amount of time.

As soon as you start forecasting there will be people who take this for (future) reality and depend on the forecast to become true. That reduces your flexibility, that creates pressure. So by all means: Don’t forecast! Take effort figures as abstract numbers just for comparison. Assigning a 2 today will not mean the same when assigned to another feature next week.

Try to do it like this: When determining the effort for a number of features start by finding the smallest one. Assign to it 1 as the effort. Then determine the relative factors for the other ones, e.g. another feature takes twice as long (effort becomes 2), yet another feature takes much, much longer (effort becomes 10) etc.

As should be obvious: The smallest feature in today’s prioritization round can be much smaller or much larger than the smallest feature in the next round. So a 1 cannot be converted to a certain number of hours or days.

Instead of promising a result (“We will deliver features a, b, c in one week’s time.”), you should just promise a behavior (“We will deliver features a, b, c in order of their priority - which might even change over time.”).

Remember: It’s always good to promise like a pro! ;-)

Bottom line: When building software go for incremental value delivery. But value alone is not enough to prioritize. You need to take effort into account. You achieve the steepest growth in value when you prioritize based on feature weight, which essentially means you calculate a speed: “value delivery speed” equals feature value by implementation time. Bet on race horse features first, leave the nags for last.

Feedback-Centric Development - The One Hacker Way

Erik Meijer got something right in his talk "One Hacker Way". There's a lot of bashing and ranting... but at the core there also is a precious diamond to be found. It's his admonition to be driven by feedback.

As software developer we should focus on production code - and let ourselves be guided by feedback.

How true! How simple! But contrary to the audience's belief it's no easy feat. He got much applause when he suggested, attendees who had not committed code recently should leave. People liked him extolling the virtues of "hacking", of focusing on code - instead of on fuzzy stuff like a process or even talking or thinking. No, it's the code, stupid!

Unfortunately they did not get the implications of this, I guess. And Erik Meijer did not tell them what that really, really means. So I'll try to describe how I see what truly and honestly focusing on code and feedback means.


I'm sorry, but before I get to code, we need to lay a foundation. We need to be very clear about why we should produce code in the first place.

Code is a tool for our customers. Customers want to use software to achieve something, to reach a goal.

In order to be helpful, code needs to fulfill certain requirements. I see three basic requirements:

  • Software needs to be functional, e.g. a calculator has to provide addition and multiplication.
  • Software needs to be efficient, e.g. a calculator has to add and multiply much fast.
  • Software needs to be evolvable, e.g. a calculator needs to be adaptable to changing functional and efficiency requirements, maybe it has to also provide sine operation or has to become even faster.

Functional and efficiency requirements define the behavior of software which is produced by logic (for me that's transformational statements, control-flow-statements, and hardware access). Evolvability requires a certain structure which is spanned by modules of several sizes (for me that's function, class, library, component, micro-service).

What Erik Meijer means, when he favors hacking over some obscure agile way of development is, that software developers should produce code in order to create appropriate behavior and structure.

And what he means, when he says we should look for feedback is, that we should check whether the code written already delivers on the behavioral and structural requirements.

Being Feedback-Centric

Now for the fun part: If Erik Meijer is serious about feedback, he needs to emphasize that it has to be sought frequently. In fact the central and recurring question everything is revolving on is:

How can I get feedback as quickly as possible on my code?

That's what I call feedback-centric. Yes, we should focus on code. But code without feedback has no value. So we should seek feedback "at all costs". As soon as possible. Frequently. From the most relevant source.

Software development thus becomes a high frequency iterative activity:

  1. Code
  2. Check (gather feedback)
  3. Back to 1. for fixing any deficit or add more behavior/structure

Feedback-centric development thus is code-first development. I like. Don't you? As Erik Meijer said: Forget about test-first programming or even TDD. It's the production code which rules!

Tiny Steps - The No. 1 Implication

If you really, really buy this - it's about code and about feedback -, then you also have to buy the implication: Coding has to progress in tiny steps.

Because only tiny steps can get you frequent feedback. If you hack away for an hour or a day without feedback, then you're coding pretty much in the dark. Truely frequent feedback is not more than a couple of minutes away.

When you look at some requirements you have to ask yourself: What can I do to get feedback in the shortest possible amount of time? Can I feedback in 10 seconds, 1 minute, 5 minutes, 15 minutes, 30 minutes?

Not Only Code - The No. 2 Implication

The ultimate feedback of course is on code. So if you can feedback on some code in 1 minute go for it.
At least initially, though, it's even faster to get feedback without any code. Producing code and getting some stakeholder to check it for conformance to requirements often takes longer than simply asking a question.

Code should not be a question, but a statement of some understanding - even if that turns out to be wrong.

So as long as you've questions or are not very sure to understand what the requirements are... do not start hacking. Rather ask questions, e.g. by talking or by presenting some test cases you made up.

Incremental Steps - The No. 3 Implication

Being driven by code and feedback also means, you can't just programming any code. The code you want to write is, well, code you can get feedback on. That means it needs to be code some stakeholder can relate to.

Feedback-centric development thus means producing code incrementally. Code needs to make a difference, needs to produce some possibly very small additional value. And if that's indeed the case only a stakeholder can tell you.

Automatic Tests - The No. 4 Implication

Once you've identified a tiny increment you can start coding. That's just fine. No need to write a test first. What a relieve, isn't it? ;-)

Then, after maybe 3 minutes of writing production code, you run the code to give yourself a first round of feedback. Since you've asked a lot of questions you're somewhat of an authority on the requirements - but of course by no means ultimately decisive. The right to accept only lies with stakeholders.

But how do you run the code and check it for deficiencies? You can do that manually. That's just fine. But how frequent can you then check?

Not checking the behavior for correctness with automatic tests is a violation of the core principle of feedback-centric development. See, it's not just code, but also fastest feedback possible. You have to balance code production and feedback generation.

That means, you need to write automatic tests. Do it after you wrote your production code. That's fine. Since you only added a tiny increment there is not much to test. At least do it for every error you encounter. Reproduce the error with an automated test, then fix it. Rerun the test to get feedback if your fix actually fixed it.

Automatic tests have two feedback purposes:

  • whether the code you just wrote delivers on the required behavioral increment
  • whether other code still delivers on its behavioral requirements (regression tests)

TDD might seem to provide no benefit. But it should be clear now, that test-after, i.e. for feedback generation after hacking is a must. It's a sine qua non if you're serious about the One Hacker Way.

Testable Structure - No. 5 Implication

Now that automatic testing finally is inevitable even for the most eager hacker ;-) it should be obvious that not just any code structure will do. The code must be structured in a way as to be easily testable.

That even means, each increment should be testable in isolation. Otherwise the feedback would not be precise - which would be a violation of a fundamental principle we started out with.

What this leads to is... refactoring. Finally! Because in TDD refactoring is clearly optional. Yeah, it's a prescribed step after the test went green - but look at the TDD examples out there. They are a testament to how easy it is to jump this step.

No, TDD does not (!) exert any force to make a developer refactor her code. Everyone rather writes the next red test.

But if you're serious about the One Hacker Way, i.e. feedback-centric development, then you have to provide yourself with quick and precise feedback. And that (!) requires you to structure the code in a way to make it possible.

  1. Code some increment
  2. Write a test to get feedback on just that increment; if that's not possible, refactor enough to get it working

Feedback-centric development makes you the first consumer of your code. Eat your own structural dog food and see if it's palatable, i.e. if you can easily test the logic hanging in that structure.

Structural Review - No. 6 Implication

Manual or even automatic tests just provide feedback on behavior. But as stated above it's not just behavioral requirements the production needs to deliver on. Customers want us to write code in a sustainable way. Nobody knows what kind of changes come around in the next weeks, months, years. So our production code needs to be prepared; it needs to be evolvable.

Evolvability is a quality of the structure of the code. Traditionally it's produced by some kind of modularization. Recently some more principles have been added to reach this goal under the name of Clean Code.

However you call it one thing is for sure: evolvability is hard to measure. Automatic tests and the customer/user can comparatively easy give feedback on functionality and efficiency. But whether evolvability is high enough... they can't tell. Especially because evolvability cannot be stated in a requirements document.

Customers simply assume software to be infinitely malleable and live forever. Mostly, at least to my experience.

That means, tools measuring certain structural metrics cannot undoubtedly truth about the structural quality of software. At best the might hint at certain spots where it seems evolvability is lower than desired.

The ultimate feedback on evolvability comes only from... developers. If developers have a hard time to change a codebase, then it's hard to evolve. That simple.

How then can feedback from developers as authorities on evolvability be gathered frequently?

Firstly, the feedback is generated implicitly by adding the next increment. If the developer trying that find it difficult, he just has generated feedback - and can act on it. Refactoring is fixing an evolvability deficiency when it arises.

Unlike with TDD where there is no feedback on structure generated, and refactoring is recommend in a broad brush manner, in feedback-centric development refactoring always has a clear purpose. It's done when necessary to enable the next increment.

Evolvability is too important to leave it to a single developer, though. Sensitivity to structural quality is very unevenly distributed among developers for several reasons. That's why is helpful to get feedback from more than one developer as soon as possible.

Enter pair programming. During pair programming it's possible to focus on behavior and structure at the same time. Four eyes see more than two. So if you haven't been convinced of pair programming so far, but like the idea of The One Hacker Way... now is the time to start pair programming. It's a valuable technique to get more frequent feedback on code structure.

Equally valuable is of course the age old technique of doing code reviews. I don't think they should be replaced by pair programming. Code reviews go beyond the four eyes of the developers who wrote the code. More eyeballs simply can spot more structural flaws. Also a group can check if the structure matches a common understanding of how code should be modularized.

Even with pair programming and code reviews I feel there is something missing, though. They generate feedback on structure with different frequencies and from different perspectives. But the feedback of both is, hm, somewhat artificial.

Improving the structure to enable an automatic test carries a certain urgency. Refactoring is really needed to be able to continue according to the principle of frequent feedback. Pair programming and code reviews don't "force"  structural improvement in such a way.

That's why I suggest another technique I call code rotation (or maybe story rotation). Core rotation means, some requirement should not be fully implemented by a single developer or even pair. If coding an increment takes a day, for example every 90 minutes the eyeballs looking at it should be completely exchanged. Maybe developer A and B start, then C and D continue etc. Yes, A and B should be replaced by a fresh pair. There is a quick handover - and then the new pair has to get along alone with the codebase.

And that's the point: Only if the developer(s) working on a requirement change completely will the be honest feedback about the structure. If even one dev of the first pair remains for a second tour on the code the feedback is "contaminated". We're so prone to lie to ourselves when it comes to our own code... This can only be avoided by letting fresh eyeballs look at it.

Sometimes I employ this technique in trainings. I let developers start to work on an exercise - and then after a while they hand their code over to their neighbour on the right. You can imagine that this is no delight for anyone ;-) But why? It's this dissonance that needs to be removed from coding. It stems from non-obivous code structures.

Bottom line: Evolvability is of great importance. Unfortunately it's hard to measure. So we need to actually look at code and work with it to get a feeling for its quality. Therefore we need to establish a hierarchy of feedback cycles:

  1. Make every increment testable - refactor as needed. Frequency: continuously
  2. Make code understandable to your pair programming partner - refactor as needed. Frequency: minutes
  3. Make code understandable to your successor - refactor as needed. Frequency: hours
  4. Make code understandable for the whole team - refactor as needed. Frequency: day(s)

Software Design - No. 7 Implication

Switching pairs during development of a feature is a tough call. Sure you want to avoid it. But why? Too much context switch? Takes too long to find your way around the code of other devs to be able to continue their work?

Yeah, right. That's all difficult. But you can choose: experience that now - or sometime in the future. And it should be obvious that it becomes harder the longer it takes until somebody else looks at your code.

In order to make code rotation as smooth as possible another technique is needed. Frequent feedback gets another prerequisite: design.

Yes, I believe the reason for explicit design now should becomes apparent. Explicit design done by a group of devs or even the whole team helps to understand the code. Also it decreases the need for refactorings later.

Some modularization cannot be avoided to be done ad hoc during hacking. But quite some modularization can be done before even starting to code. It's a part of developing a solution. It's the "thinking before coding". And it has value because it makes it easier for developers to switch working on the codebase.

So forget about design "because that's how you do software development". Also forget about not doing design "because that's how real programmings code."

Explicit design is a means to an end. It's purpose is to develop a common understanding of easily evolvable coarse grained structures - in order to increase the frequency of feedback. Because you don't want to wait years to realize you're sitting on a monolith, if the next developer can tell you in a couple of hours he has a hard time extending what you left behind.

Continuous Deployment - No. 8 Implication

Ultimate feedback only comes from those who require your code to do their job. That means it must be very, very easy to give your code to them. The closed the code to the final usage environment the better.

That's why continuous deployment is so important for any software project. We need it to be a no brainer as much as possible to deploy code so we can ask just about anybody for feedback at any time.

Think about A/B deployment, think about deploying each increment separately, think about deploying only to a subset of customers... The more freedom and options you have, the better for gathering feedback.

In Closing

At first I did not like Erik Meijer's talk much. But once I saw through his polemic fireworks I realized how much truth can be found in what he said. Never mind his suggestion to treat developers as top athletes. Never mind him calling Jeff Sutherland a satan.

Let's stick to the title, the core of his message: We need to focus on code - because only that's delivering value. And we need integrate feedback into our work much more seriously - because only then we know if we're actually heading in the right direction with our code.

Forget about hype, buzzwords, and any elaborate belief system like "Agile" or "Scrum" etc. Yes, like the Buddhists are saying: "If you meet the Buddha on a road, kill him." We need to kill our Buddhas, the gurus, the dogmas. Let's do away with cargo cults.

Instead focus on the essential: production code. And get as much feedback as possible. Truely become a closed system on many levels of your daily practice and your organization.

PS: If you happen to recognize one of your favorite “agile practices” in my above description, congratulations. Of course there is value in some of them. We don’t need to throw the baby out with the bath water. My point, though, is to justify such practices starting just from production code and the need for feedback. Nothing more, nothing less. No buzzwords, no argumentum ad verecundiam.

Software development must deliver on budget - always

Yes, I mean it: we always need to meet the budget (be that time, money or whatever resource).1

This most likely is not your software development reality. So how come I´m demanding something so seemingly unrealistic, even preposterous?


The reason for the obligation to deliver on budget is simple: trust.

Software development is a social endeavor. It takes not only two to tango, but also at least two to develop and deliver software: a customer and a software developer.

To accomplish something in collaboration with other people requires trust. Trust is the foundation because you cannot do everything yourself. You need to let go of something and trust a collaboration partner. That´s the very reason for collaboration in the first place. If you were able to do something yourself, why get someone else on board?

So if there is a need for cooperation, then there is a need for trust. Even more so if the relationship between the cooperating parties is highly asymmetric: If you could do something yourself but delegate it to somebody else, you can at least check their work for quality. Less trust is needed.

But if you don´t have a clue how to accomplish something yourself, you can´t help but delegate execution and on top of that you´re unable to check every detail of how the result is produced. You´re pretty much limited to checks on the surface. So you need much more trust in this situation.

Software development mostly is a highly asymmetric endeavor. Customers don´t understand much of it, so they need to trust software developers.

How is trust created? Trust grows through reliability and steadiness. It´s delivering on promises - and even going beyond that. If your cooperation partner does what she promised, she´s reliable. If she does that again and again... you learn to trust her.

And if someone presents you with value you had not even expected... well, that´s even better.

Gifts build trust, keeping promises builds trust.

Delivering below budget is like a gift. Delivering on budget is keeping a promise.

That simple.

What´s wrong?

But why is it so common project reality to not deliver on budget? There is something fundamentally wrong with how the industry (or its customers) interprets the term "promise", I guess.

Promises are a form of contract. A contract requires all parties to freely agree on it, which requires all parties to understand the contractual clauses (in the same way) and their implications and ramifications. And a contract requires all parties to be able to live up to it.

Contracts are the foundation of civilization. The Romans recognized that and coined the term: pacta sunt servanda.

So what goes wrong in our industry with promises?

  1. Software development often is not free to enter a contract. Management states "You deliver such scope on such budget!" - and thinks a contract exists. But it does not. If a party got coerced then it´s not actually part of the contract.
  2. Software development often does not fully understand what´s in the contract, what it implies. Some parts of the contract are clear (mostly the budget), some are not clear, fuzzy, or outright not understandable (mostly the requirements). How can software development then honor the contract?
  3. Software development regularly overestimates its capabilities to deliver on what it perceives to be in the contract. It lacks experience, it neglects dependencies etc. Even though software development freely enters a contract or even defines the budget, it is unable to deliver on it.

No small wonder, so many contracts - large and small - are not honored. Promises are broken. Trust is not build or erodes. Overall quality drops to save at least a few contractual goals. Conflicts and pressure ensue.


The only remedy to this recurring downward spiral, to this pattern is: Always deliver on budget. Always live up to what you promised.

But how?

I think, it´s very simple - which does mean it´s easy ;-)

Only make promises you can keep.

Ask yourself: Am I willing to bet 1000€ I will be able to keep this promise? If not, well, then you´re not in a position to make a promise. You doubt your own reliability.

This simple and obvious rule of course has a couple of implications:

  1. Spot and withstand any coercion. Do not enter any contract you have the faintest doubt whether you can deliver on - just because someone says so.
  2. Be humble with regard to your capabilities. Don´t overestimate what you know or are able to do.
  3. Be realistic in terms of control over your environment and your time. I strongly believe you cannot control more than maybe 1 or 2 days of your time. So you can´t promise anything beyond that.

So what can you promise? Stuff you´ve done already many times; that´s then called reproduction. You can only make promises on what you have considerable experience with and thus can deliver in "reproductive mode".

If you have experience with cooking Chicken Vindaloo you make a promise to deliver in maybe 60 minutes. But how far goes your experience in software development? Even if you´ve been doing it for 25 years your ability to reproduce is limited. This is not just because of ever changing technology, but because of ever changing customer requirements. They are hardly ever the same. As is your software.

Software is like a river: you can´t enter a river twice. Which means, it´s not the same river the next time you step into it. Something has changed, so have you.

Software development is hardly ever in reproduction mode. Mostly it´s a creative undertaking. Stuff needs to be analyzed, researched. Stuff needs to be engineered, invented.

So what can you promise to deliver on budget?

You can only promise your diligence. You can promise to work hard, to focus. But you cannot promise the implementation of a certain scope (functionality, quality) in a particular timeframe (budget). At least not beyond very small increments.

So here´s the bottom line:

  • Promise concrete results only in matters of reproduction.2
  • Since the circumstances for reproduction are rare, be conservative. Even if you are able to reproduce a result, it becomes difficult beyond 1 or 2 days - because you hardly have control over your time.
  • Without the ability to reproduce results promise only one thing: focus. Focus on a contractual topic regularly until the contract is fulfilled. Be persistent, be relentless - but don´t expect to know in advance when you´ll be done. Which means, be honest about this; don´t let yourself be lured into promises you can´t keep.

We can´t avoid to make promises. Promises are the foundation of collaboration. So we must stick to promises we actually can keep. Be extra careful what you promise. With regard to coding the time budget you agree on should be very, very short. And once you´re beyond reproduction mode, only promise diligence.

Who wants to work with unreliable people? Nobody. So we need to strive to become more reliable, more trustworthy. Working with software development should become a pleasure.

  1. An overrun of maybe 5% might be ok to classify a result as “within budget”.

  2. Yeah, I know what you´re thinking right know ;-) But there is more than that kind of reproduction…

How Agility leads to functional design and even TDD

What is it that the customer wants when she orders a software? Behavior. I define behavior as the relationship between input, output, and side effects.

It´s like with the Turing Test. When can we consider a machine intelligent? As soon as we cannot tell from a dialog whether the "hidden participant" is a human or not. The Turing Test is about behavior.

Requirements are met if some input leads to desired output and expected side effects. This includes performance, security, usability and other aspects. Behavior thus has functional as well as non-functional traits.

Now the question is: How is behavior produced?

It´s all about logic, program logic. That is operators, control structures and hardware access. Only such programming language statements are relevant to producing behavior by working on data.

Nothing has changed since Niclaus Wirth wrote "Algorithms and Data Structures" back in the 1970s. Nothing even has changed since the days of assembler programming.

Forget about Object Orientation. Forget about Functional Programming. At least for a moment. That´s all just tools, not givens.

The main question in programming is, how to move efficiently and effectively from requirements to logic? You can imagine requirements and logic separated by a huge gap, a chasm even.


On top there is the whole of all requirements for a software system. At the bottom there is all the logic that´s needed to show the required behavior. That´s just all operator statements, control statements, and hardware access statements.

Except for trivial requirements we cannot jump over the chasm. For the Fizz Buzz code kata the whole logic might appear immediately before your minds eye. But probably not even for the Bowling Game code kata, and sure not for solving Sudoku puzzles.

That means we need help to cross the chasm.

To me this help comes as a three phase process.

1. Agile analysis

The first phase is about thinking. We need to analyze the requirements. But not just in any way. We need to keep the customer´s world and the developer´s world close together.

Analysis to me means not only to understand what the customer wants, but also to slice it up into ever more fine increments.

Increments are parts/aspects of the overall behavior. The customer can give feedback on them.

User Stories and Use Cases are examples of such increments - but unfortunately they lack connection to the developer´s code reality. What´s the equivalent of a User Story in code?

That´s why I prefer (and suggest) to find more tangible increments during requirements analysis. I call them Application, Dialog, Interaction, and Feature. (There are even two more, but these are the most important ones.)


Analysis considers the problem. It´s tries to understand it - also by de-constructing into smaller problems. Analysis is a research task.

I call this kind of analysis agile, because it produces increments. It´s not about technical artifacts, but just aspects the customer can relate to.

To ask whether the requirements should be met by just one Application or more leads to smaller separate problems - and at the same to time artifacts tangible for the developer. An Application can be thought of as a project on any development platform/IDE. It´s represented by an executable file at runtime, an icon on a desktop or a URL to open in a browser.

Requirements can be fulfilled by delivering one Application after another.

The same is true for Dialogs. Each Application consists of a number of Dialogs through which users converse with the logic. Each Dialog delivered provides some value to the customer and can be given feedback on.

For the developer a Dialog is very tangible, too. It´s usually encoded as a class (or module). For GUIs that´s readily done by the IDE.

Dialogs in turn consist of a number of interactions. Buttons can be pressed, menu items clicked etc. There are many technical events happening on a user interface - but some are special. They trigger behavior. In GUI dialogs we write event handlers for that. I call them Entry Points into an application. They are like the Program.Main() functions of Java/C# programs.

Interactions represent "single behaviors": Some input is taken from the user interface, output is produced to be displayed on the user interface, and possibly other side effects happen, e.g. data gets changed in a database.

Interactions are clearly relevant to the customer. They have specific, tangible triggers. Their behavior can be exactly defined (At least it should be. The customer is responsible to provide approval criteria in terms of input/output/side effect relationships.)

But at the same time Interactions are tangible for developers. Their equivalent in code is always a function.

How much better is that than being confronted with a User Story?

User Stories and Use Cases are nice and well - but they should be mapped onto Applications, Dialogs, Interactions before moving to the next phase. Nothing is lost for the customer by that - but much is won for the developer.

And even further the analysis should go! Interactions are cross-cuts through the software. Much can happen while input is moved from the user interface through the bowls of the software to transform it into some output - again presented by the user interface. Such transformation certainly has many aspects to it. These aspects should be considered during analysis. I call them Features.

A Feature is an aspect of an Interaction in a Dialog of an Application. It will be represented in code by at least one function of its own.

Features thus are tangible for the developer - but at the same time are relevant to the customer. Take a user registration Dialog for example. Such a dialog will at least have one Interaction: register user, e.g. when hitting the OK-button.

Taking this Interaction only as a whole and trying to figure out what logic is needed to me seems hard. Better to refine it, better to slice Features off it, e.g.

  • Create new user
  • Check if user name already exists
  • Check if user name is well-formed
  • Check if password is well-formed
  • Check if password repetition equals password
  • Show error message if check fails
  • Encrypt password before storing it

Features are the most fine grained requirements in agile analysis: they are increments and can be directly mapped to code. Software can be delivered Feature by Feature. Logic can be developed Feature by Feature.1

Agile analysis can and should of course be done together with the customer. It does not need to be comprehensive. Avoid big analysis up-front. Some top-down breadth-first then depth-first analysis is sufficient until you have enough Interactions and Features on your plate to let the customer do some prioritization.

Then enter the next phase...

2. Functional design

Knowing which Applications, Dialogs, Interactions, Features there are does not close the requirements-logic-gap. Agile analysis makes it smaller, but it´s still too wide to jump across.

Interactions exist side-by-side. Their connection is through data only. But Features are connected in all sorts of ways within Interactions. Their relationship is causal. The activity of one Feature leads to activity of other Features.

Features form production processes: they take input and data from other resources and transform it into output and data in other resources.


The appropriate way of thinking about how Features make up Interactions thus is data flows. Yes, data flows and not control flows. Logic is about control flow; that´s why it contains control structures.

To find the right data flows to deliver the required behavior for each Interaction I call behavioral design or functional design.

"Behavioral design" emphasizes the purpose, the results of what´s happening by data flowing.

"Functional design" on the other hand emphasizes the technical side of data flows, their building blocks, which are functions.

Designing data flows is an engineering task. It takes the results of the analysis phase and tries to solve the "behavioral problem" of an Interaction by combining Features already found with more Features, which only become visible/necessary when looking under the hood. Functional design considers technologies and paradigms to define appropriate data flows.

Of course, such flow designs are "just bubbles" at first. But that´s not a drawback, it´s a feature. "Bubbles" can easily be revised, created, destroyed. "Bubbles" can be visualized, can be talked about among team members with just pen and paper as tools (enter: The Architect´s Napkin ;-).

Behavioral design means solving problems on a conceptual level using a simple DSL: data flows. The syntax and semantics are easy to learn. They provide a framework for phrasing solutions using domain specific vocabulary.

Functional design closes the requirements-logic-gap:


The vocabulary in the data flows can then straightforwardly be translated into functions.

Please note: Functions are not logic! Functions are containers for logic. Functional design thus results in a list of containers which then have to be filled with all the logic details that´s necessary to actually deliver the desired behavior.

Data flows on the other hand lack many details. They are declarative by purpose. They describe behavior in a comparatively coarse grained manner. They are abstractions to help find logic containers and to be able to reason about it in a simpler way.

Without data flows it´s hard to understand logic. You get bogged down in an infinite sea of details at no time.

To avoid that during design as well as bug fixing and enhancing software use data flows. Done right they provide a smooth transition from analysis to coding - and even back. Because done right data flows are not just a matter of pen and paper but are clearly visible in code. If you look at data flow code you can "reverse engineer" the data flow design from it.

But still... data flows won´t crash as long as they´re just designs.

3. Test-first coding - finally

The third phase is about coding; finally we´re talking algorithms. It´s imperative programming as you´re used to. But you start with something in your hands: a list of functions that have to be implemented. This means you can really focus on crafting code. It´s small pieces of logic at a time. (To be honest: Sometimes that´s even the most boring part ;-) Analysis and design have narrowed down the scope so much that you can do the final step from requirements to logic. It´s not a “leap of faith” anymore; it´s pretty straightforward craftsmanship.

To manifest data flow designs use test-first coding. I´m not calling it TDD, because there is less or even no refactoring after the red-green steps.2

Data flows are easy to test. The strategy is obvious:

  • Leafs of hierarchical data flows are not functionally dependent on any code you write. I call them Operations; they contain pure logic. That makes them easy to test. No dependency injection needed. That´s pure unit testing.
  • Nodes within hierarchical data flows are not functionally dependent either. Although they depend on other nodes/leafs these dependencies are not functional in nature. That´s because the nodes do not contain any logic at all. Their sole purpose is Integration. Dependency injection might be needed - but it´s pure integration testing, that means tests do not check behavior but "wiring". Integration tests answer the question: Have all parts been wired-up correctly? And since Integration data flow nodes do not contain logic they are small. Often they don´t need to be tested automatically; a review is sufficient.
  • The root node of a data flow hierarchy - often the Entry Point of an Interaction - needs to be checked with acceptance tests. This way it´s ensured the data flow actually produces the desired overall behavior as a whole.


Data flow design ensures, all functions are small. Logic is "compartmentalized" in a way to make it easy to understand and test.

Ideally each Operation is as simple as an average code kata. Functional design provides the developer with a function signature and test cases. That´s a very concrete base for driving the work of a craftsman to hammer out the algorithm, the logic.

What about classes?

At the beginning I asked you to forget about Object Orientation and other programming paradigms. Can you now see why?

Software development in general is not about any of it. It´s about delivering logic. And in order to be able to do that it´s about analyzing and designing the solution as a preparation to code. And what to code in the first place is behavior, that means logic contained in functions. That hasn´t changed in all the decades since the invention of subroutines.

Not to start software development by focusing on functions thus leads you astray. Focusing on classes (objects) first is, well, counter-productive.

Classes are containers, containers for functions. Without knowing which functions are needed to produce behavior it´s a waste to look for classes.3

Functional Programming on the other hand puts functions first. That´s good - as long as that means data flows can more easily be translated into code. But don´t get mired in fancy languages features. Talking about Monads or recursion or statelessness can deflect your mind from more important things like delivering value to the customer.

Functional design is not in contradiction with Object Orientation or Functional Programming. Take it more as a framework to use use these tools in. Tools need rules; we shouldn´t do with them all that´s possible but what´s healthy and beneficial in the long run.

That´s why I´m not a fan of pure/traditional Object Orientation or Functional Programming. Such dogma does not lead to frequent delivery of and feedback on behavior. The trend towards hybrid languages is more to my taste. C# (and even Java) becoming more Functional, and F# or Scala being Functional but also supporting Object Orientation seems to the way to go.

Hybrid languages make it easier to translate data flow designs into code with all their aspects, which includes local and shared state.


Considering all the details of an implementation of required behavior is a daunting task. That´s why we need a systematic way to approach it starting from requirements and leading to logic.

To me that´s a three phase process starting with an agile analysis resulting in fine grained increments meaningful to customers and developers alike. Then moving on to designing a solution from the increment "powder" in the form of data flows, since they allow us to reason about it on a pretty high level of abstraction. And finally coding individual functional units of those data flows in a test-first manner to get high test coverage for the myriad of details.

I´ve been working like this for the past couple of years. It has made my life as a developer and trainer and consultant much, much easier. Why don´t you try it, too? If you´ve any questions on how to start, feel free to write me an email.

  1. If your User Stories are already like what I call Features, that´s great. If not, but you like to stick with the User Story concept try to write them after you´ve uncovered Interactions and Features by agile analysis.

  2. This is of course not completely true. Not all design can be done up-front for an Interaction or even Feature. There are almost always aspects which cannot be foreseen. So you stumble across them during coding. That´s perfectly fine and does not cause harm. It just leads to an ad hoc extension of design. Because what to do during refactoring is crystal clear: morph the logic just implemented into data flows.

  3. Classes also contain data. Data structures can be build from them. As long as you use them for that purpose, go ahead. But don´t try to fit logic on them at the same time.

The Incremental Architect´s Napkin - #7 - Nest flows to scale functional design

You can design the functionality of any Entry Point using just 1D and 2D data flows. Each processing step in such flows contains logic1 to accomplish a smaller or larger part of the overall process.

To benefit most from Flow Design, the size of each such step should be small, though.

Now think of this scenario: You have a program with some 100,000 lines of code (LOC). It can be triggered through 25 Entry Points. If each started a flow of maybe 5 processing steps that would mean, functional units would contain around 800 LOC on average. In reality some probably would be just 50 LOC or 100 LOC - which would require others to contain 1,500 LOC or even more.

Yes, I mean it: Think of the whole functionality of your software being expressed as flows and implemented in functional units conforming to the Principle of Mutual Oblivion (PoMO). There´s no limit to that - even if you can´t imagine that right now yet ;-)

What should be limited, however, is the length of the implementations of the functional units. 1,500 LOC, 800 LOC, even 400 LOC is too much to easily understand. Logic of more than maybe 50 LOC or a screenful of code is hard to comprehend. Sometimes even fewer LOC are difficult to grog.

Remember the #1 rule of coding: Keep your functions small. Period. (Ok, I made up this rule just now ;-) Still I find it very reasonable.)

The #1 rule of Flow Design then could be: Don´t limit the number of processing steps. Use as many as are required to keep the implementation in line with the #1 rule of coding.2

Flow processing steps turning into functions of some 50 LOC would be great. For 100,000 LOC in the above scenario that would mean 2000 functional units spread across 25 Entry Point flows, though. With each flow consisting of 80 processing steps. On average.

That sounds unwieldy, too, doesn´t it? Even if a flow is a visual representation of functionality it´s probably hard to understand beyond maybe 10 processing steps.

The solution to this dilemma - keep function size low and at the same time keep flow length short - lies in nesting. You should be able to define flows consisting of flows. And you are.

I call such flows three dimensional (3D), since they add another direction in which to extend them. 1D flows extend sequentially, "from left to right". 2D flows extend in parallel by branching into multiple paths. 3D flows extend "vertically".


In 3D flows a 1D/2D flow is contained in a higher level processing step. These steps integrate lower level functional units into a whole which they represent. In the previous figure the top level functional unit w integrates s and t. One could say s and t form the w process.

s in turn integrates a, d, and f on the bottom level. And t wires-up b, c, and e to form a flow.

a through f are non-integrating functional units at the bottom level of this flow hierarchy.

Showing such nesting relationships by actually nesting notational elements within each other does not scale.


This might be the most authentic depiction of nested flows, but it´s hard to draw for more than three levels and a couple of functional units per level.

A better choice is to draw nested flows as a "tree" of functional units:


In this figure you see all levels of the process as well as how each integration wires-up another nested flow. Take the triangles as a rough depiction of the pinch gesture on your smartphone which you use to zoom in on a map for example. It´s the same here: each level down the diagram becomes more detailed.

Most of the time, though, you don´t need to draw deeply nested 3D flows. Usually you start with a top level flow on a napkin or flip chart and then drill down one level. If deeper nesting is needed, you take a new napkin or flip chart and continue there.

Here´s an example from a recent workshop. Never mind the German labels on the processing steps:


It´s a functional design on three levels also including the class design. But that´s a topic for another time.

What I´d like you to note here is the sketchy character of the design. It´s done quickly without much ado about layout and orderliness. It´s a "living document", a work in progress during a design session of a team. It´s not a post-implementation depiction (documentation), but a pre-implementation sketch. As that it´s not supposed to have much meaning by itself outside the group of people who came up with the Flow Design.

But it can be taken to explain the design to another person. In that the diagram would be taken as a map to point to and follow along with a finger while explaining what´s happening in each processing step on each level.

And of course it´s a memory aid. Not only talking about a (functional) design but actually keeping visually track of it helps to remember the overall software structure. A picture is worth a thousand words.

Back to LOC counting: With nested flows 80 functional units per Entry Point should not sound unwieldy anymore. Let´s put 5 functional units into a sub-flow for integration by its own functional unit on a higher level. That would lead to 16 such integrating processing steps. They would need another 3 functional units for integration on yet another higher level. So what we end up with is 1 + 3 + 16 + 80 = 100 functional units in total for some 4,000 LOC of logic code. That does not sound bad, I´d say. Admittedly it´s an overhead of 25% on functions - but it´s only maybe around 5% more LOC within functions. As you´ll see, integration code is simple. A small price to pay for the benefit of small functions throughout the code base.

Integration vs operation

You might think, nested flows are nothing more than functional decompositions of the past. Functions calling functions calling functions... But it´s not.

Yes, it´s "functions all the way down". Those functions are not created equal, though. They fundamentally differ in what their responsibilities are:

  • Integrating functional units just do that: they integrate. They do not contain any logic.
  • Non-integrating functional units just contain logic. They never integrate any other functional units. That´s Operations.

I call this the Integration Operation Segregation Principle (IOSP). It´s the Single Level of Abstraction (SLA) principle taken to the extreme. Here´s a flow hierarchy reduced to its dependencies:


There is any number of integration levels, but only one level of Operations. Operations are the leafs of the dependency tree. Only they contain logic. All nodes above them do not contain logic.

That´s what makes decomposition in Flow Design so different from earlier functional decomposition. That plus Flow Design being about data flow instead about control flow.

Or let me say it more bluntly: I strongly believe that "dirty code" is a result of not containing logic in a systematic manner like this. Instead in your code base logic is smeared all over the de facto existing functional hierarchies across all sorts of classes.

This subtly but fundamentally violates the SRP. It entangles the responsibility of whatever the logic is supposed to do (behavior) with the responsibility to integrate functional units into a whole (structure). "Pieces of" logic should not be functionally dependent on other "pieces of" logic. That´s what the PoMO is about. That´s what Object Orientation originally was about: messaging.

To fullfil functional or quality requirements, logic itself does not need any separation into functions. That means as soon as functions are introduced into code, functional dependencies can be built which entail a new responsibility: Integration.

The beauty of Operations

In the beginning there was only logic. There were expressions, control statements, and some form of hardware access. And all this logic produced some required behavior.

Then the logic grew. It grew so large that it became hard to understand on a single level of abstraction.

Also in the growing logic patterns started to appear. So the question arose, why pattern code should be repeated multiple times?

Thus were invented subroutines (functions, procedures). They helped to make programming more productive. Patterns stashed into subroutines could be re-used quickly all over the code base. And they helped to make code easier to understand, because by calling a subroutine details could be folded away.


var x = a + ...;
var y = x * ...;
var z = y / ...;


var x = a + ...;
var y = f(x);
var z = y / ...;

The change looks innocent. However it´s profound. It´s the birth of functional dependencies.

The logic transforming a etc. into z is not fully in place anymore but dependent on some function f(). There is more than one reason to change it:

  1. When the calculation of x or z changes.
  2. Or when something in the subroutine changes in a way that affects dependent logic, e.g. the subroutine suddenly does not check for certain special cases anymore.

Even though the logic and the subroutine belong closely together they are not the same. They are two functional units each with a single responsibility. Except that it´s not true for the dependent functional unit which has two responsibilities now:

  1. Create some behavior through logic (Operation)
  2. Orchestrate calls to other functions (integration)

To avoid this conflation the IOSP suggest to bundle up logic in functions which do not call each other.

Subroutines are a great tool to make code easier to understand and quicker to produce. But let´s use them in a way so they don´t lead to a violation of the fundamental SRP.

Bundle logic up in functions which do not depend on each other. No self-made function should call any other self-made function.

  • That makes Operation functions easy to test. There are no functional dependencies that need to be mocked.
  • That will naturally lead to small and thus easy to understand functions. The reason: How many lines of logic can you write before you feel the urge to stash something away in a subroutine? My guess it´s after some 100 or 200 LOC max. But what if no functional dependencies are allowed? You´ll finish the subroutine and create another one.

That´s the beauty of Operations: they are naturally short and easy to test. And it´s easy to check, if a given function is an Operation.

The beauty of Integrations

Once you start mixing logic and functional dependencies code becomes hard to understand. It consists of different levels of abstraction. It might start with a couple of lines of logic, then something happens in another function, then logic again, then the next functional dependency - and on top this all is spread across several levels of nested control statements.

Let´s be honest: It´s madness. Madness we´re very, very used to, though. Which does not makes it less mad.

We´re burdening ourselves with cognitive dissonance. We´re bending our minds to follow such arbitrary distribution of logic. Why is some of it readily visible, why is some of it hidden? We´re building mental stacks following the train of control. We´re reversing our habitual reading direction: instead from top to bottom and from left to right, we pride ourselves to have learned to read from right to left or from inner levels of nesting to outer and from bottom to top. What a feat!

But this feat, I´d say, we should always subtitle with "Don´t try this at home!" It´s a feat to be performed on stage, but not in the hurry of every day work.

So let´s stop it!

Let´s try to write code just consisting of function calls. And I mean not just function calls, but also function calls in sequence, not nested function calls.

Don´t write


instead write

var y = c(x);
var z = b(y);

Let´s try to tell a readily comprehensible story with out code. Here´s the story of converting CSV data into a table:

Developer A: First the data needs to be analyzed. Then the data gets formatted.

Developer B: What do you mean be "analyzing the data"?

Developer A: That´s simple. "Analysis" consists of parsing the CSV text and then finding out, what´s the maximum length of the values in each column.

Developer B: I see. Before you can rearrange the data, you need to break the whole chunk of CSV text up. But then... how exactly does the rearrangement work, the formatting?

Developer A: That´s straightforward. The records are formatted into an ASCII table - including the header. Also a separator line is build. And finally the separator is inserted into the ASCII table.

That´s the overall transformation process explained. There´s no logic detail in it, just sequences of what´s happening. It´s a map, not the terrain.

And like any story it can be told on different levels of abstraction.

High(est) level of abstraction:

Developer A: CSV data is transformed into an ASCII table.

Medium level of abstraction:

Developer A: First the data is analyzed, then it´s formatted.

Low level of abstraction:

Developer A: First the data is parsed, then the maximum length of values in each columns is determined, then the records are formatted into an ASCII table - including the header. At the same time a separator line is build. And finally the separator is inserted into the ASCII table.

Finally the bottom level of abstraction or no abstraction at all would be to list each step of logic. That wouldn´t be an abstract process anymore, but raw algorithm.

At the bottom it´s maximum detail, but it´s also the hardest to understand. So we should avoid as much as possible to dwell down there.

Without logic details we´re talking about Integration. Its beauty is the abstraction. Look at the code for the above story about CSV data transformation:


Each function is focused on Integration. Each function consists of an easy to understand sequence of function calls. Each function is small.

Compare to this to a pure Operation:


Now, which solution would you like to maintain?

Yes, Integration functions depend on others. But it´s not a functional dependency. Integration functions don´t contain logic, they don´t add "processing power" to the solution which could be functionally dependent. Their purpose is orthogonal to what logic does.

Integration functions are very naturally short since their building blocks (function calls) are small and it´s so cheap to create more, if it becomes hard to understand them.


Testing Operations is easy. They are not functionally dependent by definition. So there is no mocking needed. Just pass in some input and check the output.

Sometimes you have to setup state or make a resource available, but the scope you´re testing is still small. That´s because Operations cannot grow large. Once you start following the PoMO and IOSP you´ll see how the demand for a mock framework will diminish.

Testing Integrations is hard. They consist of all those function calls. A testing nightmare, right?

But in reality it´s not. Because you hardly ever test Integration functions. They are so simple, you check them by review, not automated test.

As long as all Operations are tested - which is easy - and the sequence of calls of Operations is correct in an Integration - which can be visually checked -, the Integration must be correct too.

But still... even if all Operations are correct and the Integration functions represent your Flow Design correctly the behavior of the whole can be unexpected. That´s because flows are just hypothesizes. You think a certain flow hierarchy with correct logic at the bottom will solve a problem. But you can be wrong.

So it´s of course necessary to test at least one Integration: the root of a 3D flow.

Interestingly that´s what TDD is about. TDD always starts with a root function and drives out logic details by adding tests. But TDD leaves it too your refactoring skills to produce a clean code structure.

Flow Design starts the other way round. It begins with a functional design of a solution - which is then translated into clean code. IOSP and PoMO guarantee that.

And you can test the resulting code at any level you like. Automated tests for the root Integration are a must. But during implementation of the Operation functions I also write tests for them, even if it´s private functions - which I throw away at the end. I call those "scaffolding tests". For more on this approach see my book "Informed TDD".

Stratified Design

You´re familiar with layered design: presentation layer, business logic layer, data access layer etc. Such layered design, though, is different from 3D flows.

In a layered design there is no concept of abstraction. A presentation layer is not on a higher or lower level of abstraction compared to the business logic layer or the data access layer. Only the combination of all layers forms a whole.

That´s different for abstractions. On each level of abstraction the building blocks form the whole. A layered design thus describes a solution on just one level of abstraction.

Contrast this with the 3D Flow Design for the CSV data transformation. The whole solution is described on the highes level of abstraction by Format(). One functional unit to solve it all.

On the next lower level of abstraction the whole solution is described by Analyze() + Format_as_ASCII_table().

On the next lower level of abstraction the whole solution is described by Parse() + Determine_col_widths() + Format_records() + Format_separator() + Build_table().

Below that it´s the level of logic. No abstraction anymore, only raw detail.

How do you call those levels of abstraction? They are not layers. But just "level" would be too general.

To me they look like what Abelson/Sussman called "stratum" when they talked about "stratified design".

Each stratum solves the whole problem - but in increasing detail the deeper you dig into the flow hierarchy. Each stratum consists of a Domain Specific Language (DSL) on a certain level of abstraction - and always above the logic statements of a particular programming language.

Fortunately these DSLs don´t need to be build using special tools. Their syntax is so simple just about any programing language (with functions as first class data structures) will do. The meta syntax/semantics for all such DSLs is defined by IOSP and PoMO. They are always data flow languages with just domain specific processing steps.

Here´s another scenario:

An application displays CSV data files as ASCII tables in a page-wise manner. When it´s started it asks for a file name and then shows the first page.

Here´s a 3D Flow Design for this (see the accompanying Git repository for an implementation). See how the solution to the former problem now is part of the larger solution?


Vertically it´s strata put on top of each other. The deeper you go the more detail is revealed.

At the same time, though, there are the elements of a layered design. They stretch horizontally.

Colors denote responsibilities:

  • Integrations are white,
  • presentation layer Operations are green (Ask for filename, Display table),
  • data access layer Operations are orange (Load data),
  • business logic Operations are light blue (all else).

In stratified Flow Design, though, functional units of different layers are not depending on each other. Thus layering loses its meaningfulness. It´s an obsolete concept. What remains, of course, is the application of the SRP. User interaction is different from file access or table formatting. Hence there need to be distinct function units for these aspects/responsibilities.

In closing

The quest for readable code and small functions can come to an end. Both can be achieved by following two simple principles: the Principle of Mutual Oblivion (PoMO) and the Integration Operation Segregation Principle (IOSP).

That´s true for greenfield code where you might start with a Flow Design. But it´s also true for brownfield code. Without a design look at a function and see if it´s an Operation or an Integration. Mostly you´ll find it´s a hybrid. That means you should refactor it according to PoMO and IOSP. Clean up by making it an Integration and pushing down any logic into lower level functions. Then repeat the process for all functions integrated.

I suggest you try this with a code kata. Do the bowling game kata or roman numerals or whatever. Use TDD first if you like. But in the end apply PoMO and IOSP rigorously.

In the beginning you´ll be tempted to keep just a few control statements in Integration functions. Don´t! Push them down into Operations. Yes, this will mean you´ll get function units with several outputs. But that´s ok. You know how to translate them into code using continuations or events.

Even if the resulting integration might look a bit awkward do it. You´ll get used to it. Like you got used to reversing your reading direction for nested function calls. But this time you´re getting used to a clean way of writing code ;-) That´s like getting sober. Finally.

Organizing code according to PoMO and IOSP is the only way to scale readability and understandability. We need abstractions, but we need them to be of a certain form. They need to be clean. That´s what IOSP does by introducing two fundamental domain independent responsibilities: Integration and Operation.

The beauty of this is, you can check for conformance to the SRP without even understanding the domain. Integration and Operation are structural responsibilities - like containing data ist. You can review the code of any of your colleagues to help them clean it up.

  1. Remember my definition of logic: it´s expressions, control statements and API-calls (which often stands for hardware access of some kind).

  2. I know, you´ve tried hard for years to keep the number of lines in your functions low. Nevertheless there are these monster functions of 5,000 LOC in your code base (and you´ve heard about 100,000 LOC classes in other projects). Despite all good intentions it just happens. At least that´s the code reality in many projects I´ve seen. But fear not! You´re about to learn how to keep all your functions small. Guaranteed. I promise. Just follow two principles. One you already know: the Principle of Mutual Oblivion.

Why I love Leanpub for getting my books to readers

There is some discussion going on, if/when using Leanpub is the right choice for a budding (or even established) author. Some contributions you may want to read include:

Much has been already said. So why add another article to the discussion? Because I feel there´s something missing. Some kind of systematic view of self-publishing.

Without some more structure, my guess is, authors still looking for their way to go, might get even more confused than they were before. Or is it just me who finds the self-publishing landscape quite confusing sometimes.

So here´s my take on the topic. Let me break down the self-publishing process into a couple of steps:


Publishing starts with writing. It´s always the author who does the writing. But with self-publishing the author needs and wants to do more than that.

Writing fiction is pretty much just about plain text sprinkled with some chapter headings or occaissonal italics. Also writing most non-fiction books probably does not need more than that. Maybe an image here and there, maybe some text in a box, maybe a table. Still all those artifacts just flow from top to bottom on a page.

Sure, there are some topics or some didactical requirements which crave for more. But my guess is, most authors are unlike Jurgen Appelo. Most don´t want to get that deep into book layouting.

And even if more is needed, then the question is, when is it needed? As Peter Armstrong points out, Leanpub is about “book start-up”. It´s about exploration of a topic meeting a market. How much artful design is needed for that?

But writing is not just about producing text with some layout. Nowadays it´s also about file formats. We´re talking about eBooks, right? So how do you get from a text in some text editor program to PDF, mobi, epub - which seem to be the major eBook file formats?

How to do the export from e.g. Microsoft Word? How to ensure images to be of the right size/resolution? How to get the PDF also print ready?

Sure, this is all possible with a number of tools. If you´re striving for perfection and if you´re looking for maximum freedom and control alredy in this phase of the publishing process… well, then take your time and hunt down you personal “best of breed” mix of tools.

But if you´re like me and just want “good enough” layout plus quick setup of the whole thing… then you´ll love Leanpub.

I want writing a book and getting it to potential readers as easy as printing a letter. That´s what Leanpub is delivering for me. From the idea “oh, let´s make this a book” to a print ready PDF it´s a matter of minutes.

  1. Goto in your browser (5sec)
  2. Sign in and create a new book project (1min including some meta-data)
  3. Accept the dropbox inviation from Leanpub (1min)
  4. Put your manuskript in the your Leanpub project´s folder in your dropbox (1min)
  5. Publish the project as an eBook (PDF, mobi, epub) (5sec)
  6. Export the project as a print ready PDF (5sec)

Of course this does not include the writing part ;-) And it does not include a comprehensive book description or a snazzy cover image. But that´s stuff you need to do anyway. It´s not specific to Leanpub.

What I want to make clear is the little overhead Leanpub requires. From manuscript to published eBook files as well as print ready PDF it´s just a couple of mouse clicks. You don´t need to select any tools, you don´t need to wire-up your tool chain. It´s as easy as putting files in a dropbox folder and hitting a button. Call it One-Click Publishing if you like.

I´m not saying Leanpub is unique in this. For example Liberio]( seems to offer a similar service. But currently I´m familiar with Leanpub and like it very much. It has allowed me to start book projects whenever I felt like it. Some I have finished, others are still in progress.

Also I helped other authors publish their books using Leanpub. They gave me their MS Word manuscripts and I converted them to Markdown. Each book took me less than a day from start to publication.

Which brings me to the only hurdle set up by Leanpub: Markdown. Markdown is not as powerful as MS Word. And even with Markdown editors like Mou or MarkdownPad it´s not the same as writing a manuscript in MS Word.

Switching from the jack-of-all-trades-on-every-desktop MS Word to a Markdown editor takes some getting used to. I cannot deny that. And what you can do in terms of layout is somewhat limited. But as argued above: What do you really, really need for your book anyway? Don´t overestimate that. Don´t try to be perfect, especially not during your first couple of iterations.

So I´d say: Markdown might still not be that widely known. But it´s really easy to learn. Markdown editors are here to help. It´s a good enough choice for many, many books.

As for the scenarios Jurgen Appelo depicted where Leanpub falls short (e.g. bundle only draft chapters into an eBook) I´d say: That´s not the problem of a publishing platform like Leanpub. It´s a matter of the editing tool. Neither Leanpub nor MS Word can do that. And that´s ok.


Self-publishing is supposed to be about collaboration. Collaboration between author and readers. No more “waterfall publishing” but “agile publishing”. Writing and publishing can and should go through a number of iterations.

This promises to get content out to readers earlier. And it allows for learning by the author through feedback from early readers.

Leanpub definitely supports this kind of agile or even lean approach. Nomen es omen. Setting up a book is trivial. Publishing the next iteration is trivial. Iterate as quickly as you like. Publish a new version of your book twice a day or twice a month or just once. With each iteration adapt to the market reactions. If you like. And if there are any ;-)

Leanpub offers a way for readers to give feedback. However that´s one of the features still lacking in quality, I´d say. Feedback cannot be given alongside the manuscript. Compared to the feedback system employed for Mercurial: The Definitive Guide it´s simplistic and not really state of the art.

But then… How much collaboration do you really, really need, want, expect? My experience: The willingness of the audience to provide detailed feedback is very limited. People want to read, not to co-author.

So at least to me the collaboration features are not that important. At least not with regard to public collaboration. Private collaboration among co-authors or a few hand picked alpha/beta readers is different. But how much support do I need from Leanpub for that? For my taste, it´s close to none. If I want to I can move manucript development to GitHub and get all their features plus Leanpubs ease of publishing.[1]


Once you´ve written your book and honed it based on the feedback you got, you sure want to distribute it. Widely that is.

To criticize Leanpub for not being the most widely known eBook platform is missing the point, I´d say. Although Leanpub offers easy distribution through book project landing pages, it´s not their primary purpose. (Which ultimately might limit their revenue, though.)

To compare amazon or smashwords to Leanpub is a bit like comparing apples and pears.

I use Leanpub as a platform for distribution I do myself. When I send a link to one of my books to someone I use a Leanpub link. When I point out my books in a blog post or a tweet or a newsletter I include a link to Leanpub. I do that because then readers buy from the platform which locks them in the least. Leanpub does not enforce any DRM on my books.

Also when readers buy from Leanpub I get to see their email adresses (at least if they choose to share them upon purchase). And I can reach them directly and immediately whenever I update my book.

For greater reach I use amazon (or other online stores in Germany like Thalia). And again Leanpub makes it easy for me to publish. The mobi and epub files generated by Leanpub can right away be uploaded to Kindle Direct Publishing (KDP) or XinXii.

Once you published a version of your book on Leanpub it´s a nobrainer to publish it with amazon. It maybe takes another hour. What else do you want in terms of reach?

Maybe a traditional publisher gives you more. But then you´ve already decided to go down the self-publishing road, haven´t you? You want freedom and control. That´s what you get with Leanpub. Out of the box. Without a lengthy search for tools. Plus reach - via established online channels. Check out services like KDP, bookrix, XinXii to push your book to the masses.

But don´t be disappointed if you don´t land a bestseller right away. Your book still is one of millions out there. You need marketing of some kind or the other. But that´s a different issue altogether. Neither Leanpub nor amazon nor smashwords nor Lulu will do anything special for your book.


Although self-publishing is easiest and quickest for eBooks you might want to turn your manuscript into a printed book for one reason or another. It makes for a more tangible gift, it might suit some older fashioned reader more, or whatever.

With Leanpub that´s easy. Just export your manuscript as a print ready PDF and off you go. Upload it to Createspace for example. Or Lulu. Or epubli. I´m using Createspace because that way it´s easiest for me to get the eBook and the print book next to each other in the amazon catalog.

Plus Createspace so far has been cheaper for me to get my own print copies from as an author. The quality is ok. The price including shipping from the US is ok. And for your readers who order via amazon it´s fastest. They´ll stock copies for you. Next day delivery should be no problem.


Finally, just in case you still want to earn money with your book, Leanpub makes that easy too. Much easier than amazon. From book idea to “online shop” it´s again a matter of minutes.

90% royalties are nice. Giving your readers the opportunity to pay what they like (in a price range you define) or letting them pay even more than you want is nice, too.

However I don´t find that important. 70% on amazon are ok for me too. I´m not writing books because I expect to get rich by writing. Getting some money out of it is nice. Not more. That´s why my books are priced very low. All under 10$ so far.

I use my books as a marketing tool or as text books for my trainings. It´s almost like blogging. I earn my money through other channels. And I think that´s the future for most authors. It´s like with music. The golden days for bands seem to be over. They don´t accumulate riches by selling records, but by selling tickets or whatever. eBooks like music files are easy to copy. DRM on them (e.g. Kindle books) is not here to stay. That´s my guess. So better face it right now before hitting a wall in a couple of years with a royalties based business model.


Self-publishing has become very easy compared to 10 years ago. Still, though, you´ve to find your way through the maze from manuscript to worldwide readers.

I prefer the easy road. I want my texts to hit eyeballs. For that turning a manuscript file into an eBook file has to be as simple as can be. That´s the case with Leanpub. No frills. But also no hassle. That´s what Leanpub delivers.

High frequency iterations are good for moving a project forward. No big manuscript up-front. Write a little, publish a little. That´s the modern way for the author. Readers can jump on whenever they like. But you as the author produce tangible results in the open. What a motivation to continue! That´s what Leanpub delivers.

For distribution I rely on the biggest online bookstore there is: amazon. That´s what Leanpub helps me to do.

And finally the money. That´s not really that important to me. But thanks Leanpub for some 90% royalties. And also thanks amazon for 70%. More than a decade ago when I wrote books for traditional publishers I got 12%. What a difference!

Today I´m much faster. I´m more flexible. I earn more. I can change the way things work any day. For now I´m very content with Leanpub. We´ll see how future publishing platforms look like. Choose your own. But don´t turn that into a science of its own. Starting is more important than the optimal tool chain. Stay nimble.

So much for my take at a somewhat systematic approach to answer the question “to Leanpub or not to Leanpub?” View publishing as a process consisting of phases or stages. Optimize for the whole, optimize for what´s most important for you. Maybe that´s layout. Maybe that´s speed, ease of use, royalties, reach, collaboration.

  1. I´d like to see Leanpub support Bitbucket private repositories. Bitbucket provides them for free which might be attractive for authors not having published a bestseller yet ;-)

The Incremental Architect´s Napkin - #6 - Branch flows for alternative processing

Confronted with an Entry Point into your software don´t start coding right away. Instead think about how the functionality should be structured. Into which processing steps can you partition the scope? What should be doen first, what comes next, what then, what finally. Devise a flow of data.

Think of it as an assembly line. Some raw input plus possibly some additional material is transformed into shiny output data (or some side effect fireworks).

Here is the Flow Design of the de-duplication example again:


That´s a simple sequential flow. Control flows along with the data. And it´s a one dimensional (1D) flow. There is just one path from start to end through the graph of processing nodes.

Such flows are common. For many functions they are sufficient to describe the steps to accomplish what´s required. And as you saw in the previous chapter they are easy to translate into code:

static void Main(string[] args)
    var input = Accept_string_list(args);
    var output = Deduplicate(input);

Streams causing alternative flows

So much for the happy day. But what if an error occures? Input could be missing or be malformed. Sure you would not want the program to just crash with a cryptic error message.

If "graceful failure" becomes a requirement, how could it added to the current design? I suggest a preliminary processing step for validation:


It´s still a sequential 1D flow - but now the processing steps after validaton are optional so to speak. See the stream coming out of validation? The asterisk means, maybe (args) will flow out, maybe not. It depends on whether the command line arguments were validated correctly.

For simplicity´s sake let´s assume validation just checks if the program gets called with exactly one command line argument. If not, an error message should be printed to standard output.

This could now easily be implemented:


And the effect would be stunning when running the program with invalid command line parameters:


Look at the Entry Point closely. First notice: the flow is readily visible. Still. Even though three of the four steps now run as a continuation. Let yourself not be deceived by this. Just read at the program text from top to bottom.

Technically, though, it´s not that simple. The three last steps note just written after the first one. They are nested. They get injected. That´s what makes conditional execution possible without a (visible) control statement.

Now there are two alternative execution paths through the flow:

  1. Validate_command_line()
  2. Validate_command_line(), Accept_string_list(), Deduplicate(), Present_deduplicated_string_list()

The alternatives are signified by the stream flowing from the validation.

And of course there is a control statement deciding between alternatives. But it´s not part of the Flow Design. It´s in implementation detail of Validate_command_line(). The data flow remains free of logic even though there are alternative paths through it.

Take the indentation of the continuation as a hint for the alternative. This might look a bit strange at first, but you´ll get used to it. Or if you like find some other formatting for continuations. Just be sure to keep an eye on consistency and readability - within the limits of a textual flow representation.

Branches for explicit alternative flows

Validating the command line in this way works - but it´s not a clean solution. It´s not clean, because the SRP is violated. The validation has more than a single responsibility. It has at least two: it checks for validity (expression plus control statement) and it notifies the user (API-calls).

That´s not good. The responsibilities should be separated. One functional unit for the actual check, another for user notification.

This, though, cannot be accomplished with a 1D flow. There need to explicit branches: one for the happy day, another one for the rainy day.


You see, functional units can have more than one output. In fact any number of output ports is ok. As for the translation it should be obvious that in case of more than one output a translation into return is not possible. If more than one output ports is present, all should be translated into function pointers. I don´t recommend mixing return with function pointers.

This is how the translation looks like:


Both functions - Validate_command_line() and Present_error_message() - now have a single responsibility. And the flow in Main() is still pretty clear - at least once you have gotten used to "thinking in functions".

The two paths through the flow now are:

  1. Validate_command_line(), Present_error_message()
  2. Validate_command_line(), Accept_string_list(), Deduplicate(), Present_deduplicated_string_list()

If you have a hard time figuring this out from the code, give yourself some time. Realize how you have to re-program your brain. It´s so used to see nested calls that it´s now confused. There is nesting - but the nested code is not called first? Yes. That´s a result of some Functional Programming here. The flow translation uses functions (lambda expressions) as first class data citizens.

In case you have wondered so far, what all the fuzz about lambdas and closures was all about in C# (or Java)... Now you see what it´s useful for: to easily translate Flow Designs into code.

Yes, this looks a bit clumsy. But that´s due to C# (or Java or C++ or JavaScrip) being object oriented languages. And it´s due to a textual notation. Expressing alternatives in text is always a difficult think. In a visual notation alternatives often are put side by side. That´s not possible with current text based IDEs. So don´t blame the unusual code layout not only on Flow Design.

Finally: Let me assure you that it´s possible to get used to reading this kind of code fluently. Hundreds of developers I´ve trained over the passed years have accomplished ths feat. So can you.

Back to the problem:

Please note how the two continuations of Validate_command_line() do not hint at what´s going to happen next downstream. Their names refer to the purpose of the function, not its environment. That´s what makes the function adhere to the PoMO.

Both names make it obvious which output port of Validate_command_line() is used when. That´s not so obvious in design. When you look at the Validate command line "bubble" with its two outputs you can´t see which one belongs to which alternative.

For such a small flow that´s not really a problem. But think of more than two outputs or not mutually exclusive alternatives. So if you like annotate the Flow Design with port names. I do it like this:


The same you can do for input ports, if there should be more than one. Put a name next to the port prefixed with a dot. That way the name looks like a property of the functional unit.

Also notice how both outputs are streams. That´s to signify the optionality of data flowing. It´s a conceptual thing and not technically necessary.

You can translate streams to function pointers, but in C# at least you also could choose yield return with an iterator return type (IEnumerable). Or if output data is not streamed you can still translate the output port to a function pointer.

Still, though, I guess designs are easier to understand if you put in the asterisk. Don´t think of streams as a big deal. It´s just as if functions could have an optional return value. (Which would be different from returning an option value like in F#.)

Why is there not flowing an error message out from validation? That´s just a design choice. In this case I decided against it, since there is only one error case. In other situations an error text could flow from several different validation steps to a single error reporting functional unit. Or just an error case identifier (enum). Or even an exception; instead of throwing it right away the validation could leave the decision what to do to some other functional unit.

Flow Design as language creation

As you see, the lack of control statements in Flow Design does not mean single flow of data. Flows can have many branches - although from a certain point on this becomes unwieldy. Spaghetti flows are a real danger like spaghetti code.

That´s also the reason why I would like to caution you to introduce circles into your flow graphs. Keep them free of loops. Only very rarely there should be a need for letting flow data back to an upstream functional unit.

Likewise don´t try to simulate control flow. Branching being possible does not mean, you should name your processing steps "if" or "while". This would lower the level of abstraction of your design. It would defy its purpose.

Flow Design is about creating a Domain Specific Language (DSL) en passant. It´s supposed to be declarative. It´s supposed to be on a higher level of abstraction than your programming language. Take the Flow Design notation as a universal syntax to declaratively describe solutions in arbitrary domains.

How such flows are executed should be of no/little concern to you. It´s like writing a christmas gift list. Your daughter wants a pony, your son a real racing car? They don´t care how Santa Clause manages to fulfill their wishes.

Likewise at design time trust there will be a way to implement each processing step. Later. And the more fine grained they are the easier it will be. But until then assume they are already present and functioning. Any functional unit you like. On any level of abstraction. It´s like wielding a magic wand, e.g. "Let there be a functional unit for command line parameter validation!"

There might be one or many control statements needed to implement a functional unit. But let that not leak into your design; don´t anticipate so much. Instead label your functional units with a domain specific phrase. One that describes what is happening, not how. That makes for a declarative DSL consisting of many words and phrases that are descriptive - and even re-usable.

Generalization: 2-dimensional flows

The result of Flow Design then is a flow with possibly many alternatives. A flow that branches like a river does. I call that a 2-dimensional flow because it´s not just one sequence of processing steps (1D), but many, in parallel (2D).


2D flows are data flows like 1D flows. There is nothing new to them in terms of parallel processing. Whether two processing steps are wired-up after one another or as alternatives does not require them to be implemented using multiple threads. It´s possible to do that. Flow Design makes that easier because its data flow is oblivious to control flow.

So don´t rush to find learn about Actor frameworks or async/await in C# because you want to apply Flow Design to your problems. Such technologies are orthogonal to Flow Design. For a start just rely on ordinary functions to implement processing steps. That does not diminish the usefulness of functional design.

What does 2-dimensionality mean? It means, data can flow along alternative paths through the network of nodes. Here are the paths for the above 2D flow:


Which does not mean, it´s one or the other. Data can flow along many paths at the same time. In any case conceptually at least, but also (almost) truely at runtime, if you choose to employ multi-threading of some sort. It need not be "either this path or that", it can be "this path as well as that".

But don´t let that confuse you right now. Without a tangible problem demanding for that kind of sophisticated flow design it´s pretty abstract musings. In practice this is largely no problem. Most flows are pretty straightforward.

Just keep in mind: this is data flow, not control flow. That means it´s unidirectional data exchange between independent functional units who don´t know anything about each other. They just happen to offer certain behavior which expresses itself as producing certain output or some side effect upon certain input.

Similar flows flowing back together

Data flows cannot only be split into branches, they can also flow back into each other or be joined.

Think of the famous Fizz Buzz kata: Numbers in a range, e.g. 1..100 is to be output in a special way. If can be devided by 3 "Fizz" should be written, if it´s devidable by 5 "Buzz" should be written, and if it can devided by 3 and 5 "FizzBuzz" should be written. Any other number if output as is.

Usually this kata is used to practice TDD. But of course it can also be tackled with Flow Design, although it´s scope is very narrow and the solution thus might feel a little clumsy. Basically it´s a small algorithmic problem. So Flow Design is almost overkill.

On the other hand it´s perfect to illustrate branching and flowing back.

The task is, to implement Fizz Buzz as a function like this: void FizzBuzz(int first, int last). For a given range of numbers the translations should be printed to standard output.

What´s to be done? What are the building blocks for this functionality? Here´s the result of my brainstorming:

  • Print numbers or their translations.
  • Translate number
    • First classify number
    • Then convert it
  • Number generation
  • Check range. If it´s an invalid range, throw an exception.

Notice how fine grained these processing steps are. Before I start coding I´m always eager to determine the different responsibilities. That´s one of the tasks of any design: separate aspects, responsibilities, concerns.

Printing numbers certainly is different from all else. It´s about calling an API, it´s communication with the environment, whereas the other processing steps belong to the Fizz Buzz domain.

Validation also is different from translation, isn´t it? Translation rules could change. That should no affect the validation function.

Also classification rules chould change. That should not affect the functions for converting a certain class of numbers. As well as the other way around.

"Seeing responsibilities" is one of the "arts" of software development. It can be trained, but except for some hard and fast rules in the end it remains a quite creative act. Be prepared to revise your decisions. Also be prepared for dissent in your team. But with regular reflection you´ll master this art.

Here now my Flow Design for the above bullet points:


Let me point out a couple of things:

  • Note that multiple values can flow as data at once, e.g. (first, last). That´s tuples. Passing them in as input is easy: they map to a list of formal function parameters. But how generate them as output? There are various options depending on the programming language you use.
  • Streams are used again to signify optional output. For each number data on only one port will flow out of classification.
  • The streams flowing into the translation steps produce an output stream. That´s the right thing to do here. In other scenarios, though, an input stream could result in just one output value. Think of aggregation.

Like I said, for the problem at hand this might be a bit overkill. A quite elaborate flow for such simple functionality. On the other hand that´s perfect: The problem domain is easy to undestand so we can focus on the features of Flow Design and their translation into code.

Here you see how it´s possible to have many output ports on a processing step and how many branches can flow back into one.

The visual notation makes that very easy. But how does it look in code? Will it still be readily understandable?

Let´s start with some of the processing steps:


Each of the steps is very small, very focused, very easy to understand. I think, that´s a good thing. Functions should be small, shouldn´t they? Some say no more than 10 LOC, others say 40 LOC or "a screenfull of code". In any case Flow Design very naturally leads to small functions. Don´t wait for refactoring to downsize your functions. Do it right from the beginning. You save yourself quite some refactoring trouble.

My favorite function is Classify_number(), you know. Because it´s so different from the usual Fizz Buzz implementations. Here it truely has a single responsibility: It´s the place where numbers are analyzed. It´s where there Fizz Buzz rule is located, which says, numbers must not all be treated the same.

Fizz Buzz originally is a drinking game. Who fails at "counting" correctly has drink some more - which makes it even harder to "count". The main mental effort goes into checking if a number needs translation. It´s about math - which is not easy for everyone even when sober ;-) And right this checking is represented by Classify_number(). No number generation, no translation, just checking.

That´s also the reason, why I did not bother to apply the Single Level of Abstraction (SLA) principle. I did not refactor the conditions out into their own functions but left them in there, even with a small duplication. Still the function can be tested very, very easily.

And now for the main function of the solution where the process is assembled from the functional units:


This might look a bit strange to you. But try to see through this. Try to see how systematic this translation is. And in the end you´ll see how the data flows even in the code. From that you then can re-generate the diagram. The code is the design. And when you cange the code according to the PoMO it will stay in sync with the design because it is just a "serialization" of a flow.

If you look closely, though, you might spot a seeming deviation from the design. Print() is repeated in every branch instead of calling the function just once. But in fact it´s not a deviation but a detail of the way several streams need to be joined back together into one. See it not as several calls of a function, but as a single point. It´s just 1 name, 1 function and thus represents the 1 point circled in the Flow Design.

Joining dissimilar flows

Here´s another scenario where branching helps - but how those branches flow back together is different.

The task is to write a function that formats CSV data. Its signature looks like this: string FormatCsv(string csv).

The input data are CSV records, e.g.


And the output is supposed to look like this:

Name |Age|City
Peter|26 |Hamburg
Paul |45 |London
Mary |38 |Copenhagen

The function generates an ASCII table from the raw data. The header is separated from the data records. And the columns are spaced to accomodate the longest value in either header or data records.

What are the aspects, the features of this functionality?

  • Determine column width
  • Parse input
  • Format header
  • Format data records - which should work like formatting the header
  • Format separator - which looks quite different from formatted data
  • Build the whole table from the formatted data

The order of these processing steps is simple. And as it turns out, some processing can be done in parallel:


Once the column widths have been determined, formatting the data and formatting the separator is independent of each other. That´s why I branched the flow and put the Format... processing steps in parallel.

Notice the asterisk in (csvRecord*) or (colWidth*). It denotes a list of values in an abstract manner. Whether you implement the list as an array or some list type or IEnumerable in .NET is of no concern to the design. Compare this to the asterisk outside the brackets denoting a stream of single values: (int*) stands for a list (when data flows it contains multiple values), (int)* stands for a stream (data flows multiple times containing a single value).

Formatting the separator just takes in the column width values. But formatting the records also takes in the records. Notice the "|" before the data description. It means "the following is what really flows into the next functional unit". It´s used in cases where upstream different data is output than downstream requires as input.

Determine col width outputs (colWidth*), but Format records requires (csvRecord, colWidth). That´s expressed by (colWidth) | (csvRecord,colWidth*) on the arrow pointing from Determine... to Format records.

This means, a flow defines a context. Within this context data can be "re-used". In this case the csvRecord* coming out of Parse is used again for formatting. (In code this is easy to achieve if a flow is put together in a single function. Then data can be assigned to local variables.)

Most importantly, though, this Flow Design sports a join. The join is a special functional unit. It takes n input flows and produces 1 output flow. The data of the output is a tuple combining data from all inputs.

The join waits for data to arrive on all inputs. And it outputs a tuple, whenever an input changes. In this case, though, once output was generated, the join clears its inputs. So for the next output new input has to arrive on both input ports. That´s called an auto-reset join.1

Sounds complicated? Maybe. But in the end it´s real simple. As you see in the implementation a join - even though being a functional unit of its own in the flow - does not require an extra function:


A simple function call with n parameters will do most often to bring together several branches - at least as long as you don´t resort to real parallel processing in those branches.

That´s why sometimes I simplify the join like this:


That way it does no longer look so "massive". It´s more a part of the downstream processing step.

For the remaining code of the CSV formatter see the implementation in the accompanying GitHub repository.

In closing

I hope I was able to instill some faith in you that Flow Design is rich enough to model solutions to real problems. Even though it´s not a full blown programming language it allows you to express "processes" of all sorts to deliver on the all the functional and many quality requirements of your customers.

1D and 2D flows are declarative expressions of "how things work" once control enters a software through an Entry Point.

Mutually oblivious functional units are all you need to avoid many of the pitfalls of programming usually leading to dirty code.

But wait! There´s more! ;-) You sure want to know how to scale those flows to build arbitrarily large processes.

  1. You might think, if there is an auto-reset join, there could be a manual-reset join, too. And you´re right. So far, though, I´ve found that to be of rare use. That´s why I´m not going into detail on that here.

The Incremental Architect’s Napkin - #5 - Design functions for extensibility and readability

The functionality of programs is entered via Entry Points. So what we´re talking about when designing software is a bunch of functions handling the requests represented by and flowing in through those Entry Points.

Designing software thus consists of at least three phases:

  1. Analyzing the requirements to find the Entry Points and their signatures
  2. Designing the functionality to be executed when those Entry Points get triggered
  3. Implementing the functionality according to the design aka coding

I presume, you´re familiar with phase 1 in some way. And I guess you´re proficient in implementing functionality in some programming language.

But in my experience developers in general are not experienced in going through an explicit phase 2. “Designing functionality? What´s that supposed to mean?” you might already have thought.

Here´s my definition: To design functionality (or functional design for short) means thinking about… well, functions. You find a solution for what´s supposed to happen when an Entry Point gets triggered in terms of functions. A conceptual solution that is, because those functions only exist in your head (or on paper) during this phase. But you may have guess that, because it´s “design” not “coding”.

And here is, what functional design is not: It´s not about logic. Logic is expressions (e.g. +, -, && etc.) and control statements (e.g. if, switch, for, while etc.). Also I consider calling external APIs as logic. It´s equally basic. It´s what code needs to do in order to deliver some functionality or quality.

Logic is what´s doing that needs to be done by software. Transformations are either done through expressions or API-calls. And then there is alternative control flow depending on the result of some expression. Basically it´s just jumps in Assembler, sometimes to go forward (if, switch), sometimes to go backward (for, while, do).

But calling your own function is not logic. It´s not necessary to produce any outcome. Functionality is not enhanced by adding functions (subroutine calls) to your code. Nor is quality increased by adding functions. No performance gain, no higher scalability etc. through functions.

Functions are not relevant to functionality. Strange, isn´t it.

What they are important for is security of investment. By introducing functions into our code we can become more productive (re-use) and can increase evolvability (higher unterstandability, easier to keep code consistent).

That´s no small feat, however. Evolvable code can hardly be overestimated. That´s why to me functional design is so important. It´s at the core of software development.

To sum this up: Functional design is on a level of abstraction above (!) logical design or algorithmic design. Functional design is only done until you get to a point where each function is so simple you are very confident you can easily code it.

Functional design an logical design (which mostly is coding, but can also be done using pseudo code or flow charts) are complementary. Software needs both. If you start coding right away you end up in a tangled mess very quickly. Then you need back out through refactoring. Functional design on the other hand is bloodless without actual code. It´s just a theory with no experiments to prove it.

But how to do functional design?

An example of functional design

Let´s assume a program to de-duplicate strings. The user enters a number of strings separated by commas, e.g. a, b, a, c, d, b, e, c, a. And the program is supposed to clear this list of all doubles, e.g. a, b, c, d, e.

There is only one Entry Point to this program: the user triggers the de-duplication by starting the program with the string list on the command line

C:\>deduplicate "a, b, a, c, d, b, e, c, a"
a, b, c, d, e

…or by clicking on a GUI button.


This leads to the Entry Point function to get called. It´s the program´s main function in case of the batch version or a button click event handler in the GUI version. That´s the physical Entry Point so to speak. It´s inevitable.

What then happens is a three step process:

  1. Transform the input data from the user into a request.
  2. Call the request handler.
  3. Transform the output of the request handler into a tangible result for the user.

Or to phrase it a bit more generally:

  1. Accept input.
  2. Transform input into output.
  3. Present output.

This does not mean any of these steps requires a lot of effort. Maybe it´s just one line of code to accomplish it. Nevertheless it´s a distinct step in doing the processing behind an Entry Point. Call it an aspect or a responsibility - and you will realize it most likely deserves a function of its own to satisfy the Single Responsibility Principle (SRP).

Interestingly the above list of steps is already functional design. There is no logic, but nevertheless the solution is described - albeit on a higher level of abstraction than you might have done yourself.

But it´s still on a meta-level. The application to the domain at hand is easy, though:

  1. Accept string list from command line
  2. De-duplicate
  3. Present de-duplicated strings on standard output

And this concrete list of processing steps can easily be transformed into code:

static void Main(string[] args)
    var input = Accept_string_list(args);
    var output = Deduplicate(input);

Instead of a big problem there are three much smaller problems now. If you think each of those is trivial to implement, then go for it. You can stop the functional design at this point.

But maybe, just maybe, you´re not so sure how to go about with the de-duplication for example. Then just implement what´s easy right now, e.g.

private static string Accept_string_list(string[] args)
    return args[0];

private static void 
            string[] output)
    var line = string.Join(", ", output);

Accept_string_list() contains logic in the form of an API-call. Present_deduplicated_string_list() contains logic in the form of an expression and an API-call.

And then repeat the functional design for the remaining processing step. What´s left is the domain logic: de-duplicating a list of strings. How should that be done?

Without any logic at our disposal during functional design you´re left with just functions. So which functions could make up the de-duplication? Here´s a suggestion:

  • De-duplicate
  • Parse the input string into a true list of strings.
  • Register each string in a dictionary/map/set. That way duplicates get cast away.
  • Transform the data structure into a list of unique strings.

Processing step 2 obviously was the core of the solution. That´s where real creativity was needed. That´s the core of the domain. But now after this refinement the implementation of each step is easy again:

private static string[] Parse_string_list(string input)
    return input.Split(',')
                .Select(s => s.Trim())

private static Dictionary<string,object> 
        Compile_unique_strings(string[] strings)
    return strings.Aggregate(
            new Dictionary<string, object>(),
            (agg, s) => { 
                agg[s] = null;
                return agg;

private static string[] Serialize_unique_strings(
               Dictionary<string,object> dict)
    return dict.Keys.ToArray();

With these three additional functions Main() now looks like this:

static void Main(string[] args)
    var input = Accept_string_list(args);

    var strings = Parse_string_list(input);
    var dict = Compile_unique_strings(strings);
    var output = Serialize_unique_strings(dict);


I think that´s very understandable code: just read it from top to bottom and you know how the solution to the problem works. It´s a mirror image of the initial design:

  1. Accept string list from command line
  2. Parse the input string into a true list of strings.
  3. Register each string in a dictionary/map/set. That way duplicates get cast away.
  4. Transform the data structure into a list of unique strings.
  5. Present de-duplicated strings on standard output

You can even re-generate the design by just looking at the code. Code and functional design thus are always in sync - if you follow some simple rules. But about that later.

And as a bonus: all the functions making up the process are small - which means easy to understand, too.

So much for an initial concrete example. Now it´s time for some theory. Because there is method to this madness ;-) The above has only scratched the surface.

Introducing Flow Design

Functional design starts with a given function, the Entry Point. Its goal is to describe the behavior of the program when the Entry Point is triggered using a process, not an algorithm.

An algorithm consists of logic, a process on the other hand consists just of steps or stages. Each processing step transforms input into output or a side effect. Also it might access resources, e.g. a printer, a database, or just memory. Processing steps thus can rely on state of some sort. This is different from Functional Programming, where functions are supposed to not be stateful and not cause side effects.[1]

In its simplest form a process can be written as a bullet point list of steps, e.g.

  • Get data from user
  • Output result to user
  • Transform data
  • Parse data
  • Map result for output

Such a compilation of steps - possibly on different levels of abstraction - often is the first artifact of functional design. It can be generated by a team in an initial design brainstorming.

Next comes ordering the steps. What should happen first, what next etc.?

  1. Get data from user
  2. Parse data
  3. Transform data
  4. Map result for output
  5. Output result to user

That´s great for a start into functional design. It´s better than starting to code right away on a given function using TDD.

Please get me right: TDD is a valuable practice. But it can be unnecessarily hard if the scope of a functionn is too large. But how do you know beforehand without investing some thinking? And how to do this thinking in a systematic fashion?

My recommendation: For any given function you´re supposed to implement first do a functional design. Then, once you´re confident you know the processing steps - which are pretty small - refine and code them using TDD. You´ll see that´s much, much easier - and leads to cleaner code right away. For more information on this approach I call “Informed TDD” read my book of the same title.

Thinking before coding is smart. And writing down the solution as a bunch of functions possibly is the simplest thing you can do, I´d say. It´s more according to the KISS (Keep It Simple, Stupid) principle than returning constants or other trivial stuff TDD development often is started with.

So far so good. A simple ordered list of processing steps will do to start with functional design. As shown in the above example such steps can easily be translated into functions. Moving from design to coding thus is simple.

However, such a list does not scale. Processing is not always that simple to be captured in a list. And then the list is just text. Again. Like code. That means the design is lacking visuality. Textual representations need more parsing by your brain than visual representations. Plus they are limited in their “dimensionality”: text just has one dimension, it´s sequential. Alternatives and parallelism are hard to encode in text.

In addition the functional design using numbered lists lacks data. It´s not visible what´s the input, output, and state of the processing steps.

That´s why functional design should be done using a lightweight visual notation. No tool is necessary to draw such designs. Use pen and paper; a flipchart, a whiteboard, or even a napkin is sufficient.

Visualizing processes

The building block of the functional design notation is a functional unit. I mostly draw it like this:


Something is done, it´s clear what goes in, it´s clear what comes out, and it´s clear what the processing step requires in terms of state or hardware.

Whenever input flows into a functional unit it gets processed and output is produced and/or a side effect occurs. Flowing data is the driver of something happening. That´s why I call this approach to functional design Flow Design.

It´s about data flow instead of control flow. Control flow like in algorithms is of no concern to functional design. Thinking about control flow simply is too low level. Once you start with control flow you easily get bogged down by tons of details.

That´s what you want to avoid during design. Design is supposed to be quick, broad brush, abstract. It should give overview.

But what about all the details? As Robert C. Martin rightly said: “Programming is abot detail”.

Detail is a matter of code. Once you start coding the processing steps you designed you can worry about all the detail you want.

Functional design does not eliminate all the nitty gritty. It just postpones tackling them. To me that´s also an example of the SRP. Function design has the responsibility to come up with a solution to a problem posed by a single function (Entry Point). And later coding has the responsibility to implement the solution down to the last detail (i.e. statement, API-call).

TDD unfortunately mixes both responsibilities. It´s just coding - and thereby trying to find detailed implementations (green phase) plus getting the design right (refactoring). To me that´s one reason why TDD has failed to deliver on its promise for many developers.

Using functional units as building blocks of functional design processes can be depicted very easily. Here´s the initial process for the example problem:


For each processing step draw a functional unit and label it. Choose a verb or an “action phrase” as a label, not a noun. Functional design is about activities, not state or structure.

Then make the output of an upstream step the input of a downstream step. Finally think about the data that should flow between the functional units.

Write the data above the arrows connecting the functional units in the direction of the data flow. Enclose the data description in brackets. That way you can clearly see if all flows have already been specified.

Empty brackets mean “no data is flowing”, but nevertheless a signal is sent.

A name like “list” or “strings” in brackets describes the data content. Use lower case labels for that purpose.

A name starting with an upper case letter like “String” or “Customer” on the other hand signifies a data type.

If you like, you also can combine descriptions with data types by separating them with a colon, e.g. (list:string) or (strings:string[]).

But these are just suggestions from my practice with Flow Design. You can do it differently, if you like. Just be sure to be consistent.

Flows wired-up in this manner I call one-dimensional (1D). Each functional unit just has one input and/or one output.

A functional unit without an output is possible. It´s like a black hole sucking up input without producing any output. Instead it produces side effects.

A functional unit without an input, though, does make much sense. When should it start to work? What´s the trigger? That´s why in the above process even the first processing step has an input.

If you like, view such 1D-flows as pipelines. Data is flowing through them from left to right. But as you can see, it´s not always the same data. It get´s transformed along its passage: (args) becomes a (list) which is turned into (strings).

The Principle of Mutual Oblivion

A very characteristic trait of flows put together from function units is: no functional units knows another one. They are all completely independent of each other.

Functional units don´t know where their input is coming from (or even when it´s gonna arrive). They just specify a range of values they can process. And they promise a certain behavior upon input arriving.

Also they don´t know where their output is going. They just produce it in their own time independent of other functional units. That means at least conceptually all functional units work in parallel.

Functional units don´t know their “deployment context”. They now nothing about the overall flow they are place in. They are just consuming input from some upstream, and producing output for some downstream.

That makes functional units very easy to test. At least as long as they don´t depend on state or resources.

I call this the Principle of Mutual Oblivion (PoMO). Functional units are oblivious of others as well as an overall context/purpose. They are just parts of a whole focused on a single responsibility.

How the whole is built, how a larger goal is achieved, is of no concern to the single functional units.

By building software in such a manner, functional design interestingly follows nature. Nature´s building blocks for organisms also follow the PoMO. The cells forming your body do not know each other.

Take a nerve cell “controlling” a muscle cell for example:[2]


The nerve cell does not know anything about muscle cells, let alone the specific muscel cell it is “attached to”. Likewise the muscle cell does not know anything about nerve cells, let a lone a specific nerve cell “attached to” it. Saying “the nerve cell is controlling the muscle cell” thus only makes sense when viewing both from the outside. “Control” is a concept of the whole, not of its parts. Control is created by wiring-up parts in a certain way.

Both cells are mutually oblivious. Both just follow a contract. One produces Acetylcholine (ACh) as output, the other consumes ACh as input. Where the ACh is going, where it´s coming from neither cell cares about.

Million years of evolution have led to this kind of division of labor. And million years of evolution have produced organism designs (DNA) which lead to the production of these different cell types (and many others) and also to their co-location. The result: the overall behavior of an organism.

How and why this happened in nature is a mystery. For our software, though, it´s clear: functional and quality requirements needs to be fulfilled. So we as developers have to become “intelligent designers” of “software cells” which we put together to form a “software organism” which responds in satisfying ways to triggers from it´s environment.

My bet is: If nature gets complex organisms working by following the PoMO, who are we to not apply this recipe for success to our much simpler “machines”?

So my rule is: Wherever there is functionality to be delivered, because there is a clear Entry Point into software, design the functionality like nature would do it. Build it from mutually oblivious functional units.

That´s what Flow Design is about. In that way it´s even universal, I´d say. Its notation can also be applied to biology:


Never mind labeling the functional units with nouns. That´s ok in Flow Design. You´ll do that occassionally for functional units on a higher level of abstraction or when their purpose is close to hardware.

Getting a cockroach to roam your bedroom takes 1,000,000 nerve cells (neurons). Getting the de-duplication program to do its job just takes 5 “software cells” (functional units). Both, though, follow the same basic principle.

Translating functional units into code

Moving from functional design to code is no rocket science. In fact it´s straightforward. There are two simple rules:

  • Translate an input port to a function.
  • Translate an output port either to a return statement in that function or to a function pointer visible to that function.


The simplest translation of a functional unit is a function. That´s what you saw in the above example. Functions are mutually oblivious. That why Functional Programming likes them so much. It makes them composable. Which is the reason, nature works according to the PoMO.

Let´s be clear about one thing: There is no dependency injection in nature. For all of an organism´s complexity no DI container is used. Behavior is the result of smooth cooperation between mutually oblivious building blocks.

Functions will often be the adequate translation for the functional units in your designs. But not always. Take for example the case, where a processing step should not always produce an output. Maybe the purpose is to filter input.


Here the functional unit consumes words and produces words. But it does not pass along every word flowing in. Some words are swallowed.

Think of a spell checker. It probably should not check acronyms for correctness. There are too many of them. Or words with no more than two letters. Such words are called “stop words”.

In the above picture the optionality of the output is signified by the astrisk outside the brackets. It means: Any number of (word) data items can flow from the functional unit for each input data item. It might be none or one or even more. This I call a stream of data.

Such behavior cannot be translated into a function where output is generated with return. Because a function always needs to return a value.

So the output port is translated into a function pointer or continuation which gets passed to the subroutine when called:[3]

void filter_stop_words(
       string word,
       Action<string> onNoStopWord) {
  if (...check if not a stop word...)

If you want to be nitpicky you might call such a function pointer parameter an injection. And technically you´re right. Conceptually, though, it´s not an injection. Because the subroutine is not functionally dependent on the continuation.

Firstly continuations are procedures, i.e. subroutines without a return type. Remember: Flow Design is about unidirectional data flow.

Secondly the name of the formal parameter is chosen in a way as to not assume anything about downstream processing steps. onNoStopWord describes a situation (or event) within the functional unit only.

Translating output ports into function pointers helps keeping functional units mutually oblivious in cases where output is optional or produced asynchronically.

Either pass the function pointer to the function upon call. Or make it global by putting it on the encompassing class. Then it´s called an event. In C# that´s even an explicit feature.

class Filter {
  public void filter_stop_words(
                string word) {
    if (...check if not a stop word...)

  public event Action<string> onNoStopWord;

When to use a continuation and when to use an event dependens on how a functional unit is used in flows and how it´s packed together with others into classes. You´ll see examples further down the Flow Design road.

Another example of 1D functional design

Let´s see Flow Design once more in action using the visual notation. How about the famous word wrap kata? Robert C. Martin has posted a much cited solution including an extensive reasoning behind his TDD approach. So maybe you want to compare it to Flow Design.

The function signature given is:

string WordWrap(string text, int maxLineLength) 

That´s not an Entry Point since we don´t see an application with an environment and users. Nevertheless it´s a function which is supposed to provide a certain functionality.

The text passed in has to be reformatted. The input is a single line of arbitrary length consisting of words separated by spaces. The output should consist of one or more lines of a maximum length specified.

If a word is longer than a the maximum line length it can be split in multiple parts each fitting in a line.

Flow Design

Let´s start by brainstorming the process to accomplish the feat of reformatting the text. What´s needed?

  • Words need to be assembled into lines
  • Words need to be extracted from the input text
  • The resulting lines need to be assembled into the output text
  • Words too long to fit in a line need to be split

Does sound about right? I guess so. And it shows a kind of priority. Long words are a special case. So maybe there is a hint for an incremental design here. First let´s tackle “average words” (words not longer than a line).

Here´s the Flow Design for this increment:


The the first three bullet points turned into functional units with explicit data added.

As the signature requires a text is transformed into another text. See the input of the first functional unit and the output of the last functional unit.

In between no text flows, but words and lines. That´s good to see because thereby the domain is clearly represented in the design. The requirements are talking about words and lines and here they are.

But note the asterisk! It´s not outside the brackets but inside. That means it´s not a stream of words or lines, but lists or sequences. For each text a sequence of words is output. For each sequence of words a sequence of lines is produced.

The asterisk is used to abstract from the concrete implementation. Like with streams. Whether the list of words gets implemented as an array or an IEnumerable is not important during design. It´s an implementation detail.

Does any processing step require further refinement? I don´t think so. They all look pretty “atomic” to me. And if not… I can always backtrack and refine a process step using functional design later once I´ve gained more insight into a sub-problem.


The implementation is straightforward as you can imagine. The processing steps can all be translated into functions. Each can be tested easily and separately. Each has a focused responsibility.


And the process flow becomes just a sequence of function calls:


Easy to understand. It clearly states how word wrapping works - on a high level of abstraction.

And it´s easy to evolve as you´ll see.

Flow Design - Increment 2

So far only texts consisting of “average words” are wrapped correctly. Words not fitting in a line will result in lines too long.

Wrapping long words is a feature of the requested functionality. Whether it´s there or not makes a difference to the user. To quickly get feedback I decided to first implement a solution without this feature. But now it´s time to add it to deliver the full scope.

Fortunately Flow Design automatically leads to code following the Open Closed Principle (OCP). It´s easy to extend it - instead of changing well tested code. How´s that possible?

Flow Design allows for extension of functionality by inserting functional units into the flow. That way existing functional units need not be changed. The data flow arrow between functional units is a natural extension point. No need to resort to the Strategy Pattern. No need to think ahead where extions might need to be made in the future.

I just “phase in” the remaining processing step:


Since neither Extract words nor Reformat know of their environment neither needs to be touched due to the “detour”. The new processing step accepts the output of the existing upstream step and produces data compatible with the existing downstream step.

Implementation - Increment 2

A trivial implementation checking the assumption if this works does not do anything to split long words. The input is just passed on:


Note how clean WordWrap() stays. The solution is easy to understand. A developer looking at this code sometime in the future, when a new feature needs to be build in, quickly sees how long words are dealt with.

Compare this to Robert C. Martin´s solution:[4]


How does this solution handle long words? Long words are not even part of the domain language present in the code. At least I need considerable time to understand the approach.

Admittedly the Flow Design solution with the full implementation of long word splitting is longer than Robert C. Martin´s. At least it seems. Because his solution does not cover all the “word wrap situations” the Flow Design solution handles. Some lines would need to be added to be on par, I guess.

But even then… Is a difference in LOC that important as long as it´s in the same ball park? I value understandability and openness for extension higher than saving on the last line of code. Simplicity is not just less code, it´s also clarity in design.

But don´t take my word for it. Try Flow Design on larger problems and compare for yourself. What´s the easier, more straightforward way to clean code? And keep in mind: You ain´t seen all yet ;-) There´s more to Flow Design than described in this chapter.

In closing

I hope I was able to give you a impression of functional design that makes you hungry for more. To me it´s an inevitable step in software development. Jumping from requirements to code does not scale. And it leads to dirty code all to quickly.

Some thought should be invested first. Where there is a clear Entry Point visible, it´s functionality should be designed using data flows. Because with data flows abstraction is possible. For more background on why that´s necessary read my blog article here.

For now let me point out to you - if you haven´t already noticed - that Flow Design is a general purpose declarative language. It´s “programming by intention” (Shalloway et al.).

Just write down how you think the solution should work on a high level of abstraction. This breaks down a large problem in smaller problems. And by following the PoMO the solutions to those smaller problems are independent of each other. So they are easy to test. Or you could even think about getting them implemented in parallel by different team members.

Flow Design not only increases evolvability, but also helps becoming more productive. All team members can participate in functional design. This goes beyon collective code ownership. We´re talking collective design/architecture ownership. Because with Flow Design there is a common visual language to talk about functional design - which is the foundation for all other design activities.


PS: If you like what you read, consider getting my ebook “The Incremental Architekt´s Napkin”. It´s where I compile all the articles in this series for easier reading.

  1. I like the strictness of Function Programming - but I also find it quite hard to live by. And it certainly is not what millions of programmers are used to. Also to me it seems, the real world is full of state and side effects. So why give them such a bad image? That´s why functional design takes a more pragmatic approach. State and side effects are ok for processing steps - but be sure to follow the SRP. Don´t put too much of it into a single processing step.

  2. Image taken from

  3. My code samples are written in C#. C# sports typed function pointers called delegates. Action is such a function pointer type matching functions with signature void someName(T t). Other languages provide similar ways to work with functions as first class citizens - even Java now in version 8. I trust you find a way to map this detail of my translation to your favorite programming language. I know it works for Java, C++, Ruby, JavaScript, Python, Go. And if you´re using a Functional Programming language it´s of course a no brainer.

  4. Taken from his blog post “The Craftsman 62, The Dark Path”.

Abstracting functionality

What is more important than data? Functionality. Yes, I strongly believe we should switch to a functionality over data mindset in programming. Or actually switch back to it.

Focus on functionality

Functionality once was at the core of software development. Back when algorithms were the first thing you heard about in CS classes. Sure, data structures, too, were important - but always from the point of view of algorithms. (Niklaus Wirth gave one of his books the title “Algorithms + Data Structures” instead of “Data Structures + Algorithms” for a reason.)

The reason for the focus on functionality? Firstly, because software was and is about doing stuff. Secondly because sufficient performance was hard to achieve, and only thirdly memory efficiency.

But then hardware became more powerful. That gave rise to a new mindset: object orientation. And with it functionality was devalued. Data took over its place as the most important aspect. Now discussions revolved around structures motivated by data relationships. (John Beidler gave his book the title “Data Structures and Algorithms: An Object Oriented Approach” instead of the other way around for a reason.)

Sure, this data could be embellished with functionality. But nevertheless functionality was second.

imageWhen you look at (domain) object models what you mostly find is (domain) data object models. The common object oriented approach is: data aka structure over functionality. This is true even for the most modern modeling approaches like Domain Driven Design. Look at the literature and what you find is recommendations on how to get data structures right: aggregates, entities, value objects.

I´m not saying this is what object orientation was invented for. But I´m saying that´s what I happen to see across many teams now some 25 years after object orientation became mainstream through C++, Delphi, and Java.

But why should we switch back? Because software development cannot become truly agile with a data focus. The reason for that lies in what customers need first: functionality, behavior, operations.

To be clear, that´s not why software is built. The purpose of software is to be more efficient than the alternative. Money mainly is spent to get a certain level of quality (e.g. performance, scalability, security etc.). But without functionality being present, there is nothing to work on the quality of.

What customers want is functionality of a certain quality. ASAP. And tomorrow new functionality needs to be added, existing functionality needs to be changed, and quality needs to be increased.

No customer ever wanted data or structures.

Of course data should be processed. Data is there, data gets generated, transformed, stored. But how the data is structured for this to happen efficiently is of no concern to the customer.

Ask a customer (or user) whether she likes the data structured this way or that way. She´ll say, “I don´t care.” But ask a customer (or user) whether he likes the functionality and its quality this way or that way. He´ll say, “I like it” (or “I don´t like it”).

Build software incrementally

From this very natural focus of customers and users on functionality and its quality follows we should develop software incrementally. That´s what Agility is about.

Deliver small increments quickly and often to get frequent feedback. That way less waste is produced, and learning can take place much easier (on the side of the customer as well as on the side of developers).

An increment is some added functionality or quality of functionality.[1]

So as it turns out, Agility is about functionality over whatever. But software developers’ thinking is still stuck in the object oriented mindset of whatever over functionality. Bummer. I guess that (at least partly) explains why Agility always hits a glass ceiling in projects. It´s a clash of mindsets, of cultures.

Driving software development by demanding small increases in functionality runs against thinking about software as growing (data) structures sprinkled with functionality. (Excuse me, if this sounds a bit broad-brush. But you get my point.)

The need for abstraction

In the end there need to be data structures. Of course. Small and large ones. The phrase functionality over data does not deny that. It´s not functionality instead of data or something. It´s just over, i.e. functionality should be thought of first. It´s a tad more important. It´s what the customer wants.

That´s why we need a way to design functionality. Small and large. We need to be able to think about functionality before implementing it. We need to be able to reason about it among team members. We need to be able to communicate our mental models of functionality not just by speaking about them, but also on paper. Otherwise reasoning about it does not scale.

imageWe learned thinking about functionality in the small using flow charts, Nassi-Shneiderman diagrams, pseudo code, or UML sequence diagrams.

That´s nice and well. But it does not scale. You can use these tools to describe manageable algorithms. But it does not work for the functionality triggered by pressing the “1-Click Order” on an amazon product page for example.

There are several reasons for that, I´d say.

Firstly, the level of abstraction over code is negligible. It´s essentially non-existent. Drawing a flow chart or writing pseudo code or writing actual code is very, very much alike. All these tools are about control flow like code is.[2]

In addition all tools are computationally complete. They are about logic which is expressions and especially control statements. Whatever you code in Java you can fully (!) describe using a flow chart.

And then there is no data. They are about control flow and leave out the data altogether. Thus data mostly is assumed to be global. That´s shooting yourself in the foot, as I hope you agree.

Even if it´s functionality over data that does not mean “don´t think about data”. Right to the contrary! Functionality only makes sense with regard to data. So data needs to be in the picture right from the start - but it must not dominate the thinking. The above tools fail on this.

Bottom line: So far we´re unable to reason in a scalable and abstract manner about functionality.

That´s why programmers are so driven to start coding once they are presented with a problem. Programming languages are the only tool they´ve learned to use to reason about functional solutions.

imageOr, well, there might be exceptions. Mathematical notation and SQL may have come to your mind already. Indeed they are tools on a higher level of abstraction than flow charts etc. That´s because they are declarative and not computationally complete. They leave out details - in order to deliver higher efficiency in devising overall solutions.

We can easily reason about functionality using mathematics and SQL. That´s great. Except for that they are domain specific languages. They are not general purpose. (And they don´t scale either, I´d say.) Bummer.

So to be more precise we need a scalable general purpose tool on a higher than code level of abstraction not neglecting data.

Enter: Flow Design.

Abstracting functionality using data flows

I believe the solution to the problem of abstracting functionality lies in switching from control flow to data flow.

Data flow very naturally is not about logic details anymore. There are no expressions and no control statements anymore. There are not even statements anymore. Data flow is declarative by nature.


With data flow we get rid of all the limiting traits of former approaches to modeling functionality.

In addition, nomen est omen, data flows include data in the functionality picture.

With data flows, data is visibly flowing from processing step to processing step. Control is not flowing. Control is wherever it´s needed to process data coming in.

That´s a crucial difference and needs some rewiring in your head to be fully appreciated.[2]

Since data flows are declarative they are not the right tool to describe algorithms, though, I´d say. With them you don´t design functionality on a low level. During design data flow processing steps are black boxes. They get fleshed out during coding.

Data flow design thus is more coarse grained than flow chart design. It starts on a higher level of abstraction - but then is not limited. By nesting data flows indefinitely you can design functionality of any size, without losing sight of your data.


Data flows scale very well during design. They can be used on any level of granularity. And they can easily be depicted. Communicating designs using data flows is easy and scales well, too.

The result of functional design using data flows is not algorithms (too low level), but processes. Think of data flows as descriptions of industrial production lines. Data as material runs through a number of processing steps to be analyzed, enhances, transformed.

On the top level of a data flow design might be just one processing step, e.g. “execute 1-click order”. But below that are arbitrary levels of flows with smaller and smaller steps.

That´s not layering as in “layered architecture”, though. Rather it´s a stratified design à la Abelson/Sussman.

Refining data flows is not your grandpa´s functional decomposition. That was rooted in control flows. Refining data flows does not suffer from the limits of functional decomposition against which object orientation was supposed to be an antidote.


I´ve been working exclusively with data flows for functional design for the past 4 years. It has changed my life as a programmer. What once was difficult is now easy. And, no, I´m not using Clojure or F#. And I´m not a async/parallel execution buff.

Designing the functionality of increments using data flows works great with teams. It produces design documentation which can easily be translated into code - in which then the smallest data flow processing steps have to be fleshed out - which is comparatively easy.

Using a systematic translation approach code can mirror the data flow design. That way later on the design can easily be reproduced from the code if need be.

And finally, data flow designs play well with object orientation. They are a great starting point for class design. But that´s a story for another day.

To me data flow design simply is one of the missing links of systematic lightweight software design.

  1. There are also other artifacts software development can produce to get feedback, e.g. process descriptions, test cases. But customers can be delighted more easily with code based increments in functionality.

  2. No, I´m not talking about the endless possibilities this opens for parallel processing. Data flows are useful independently of multi-core processors and Actor-based designs. That´s my whole point here. Data flows are good for reasoning and evolvability. So forget about any special frameworks you might need to reap benefits from data flows. None are necessary. Translating data flow designs even into plain of Java is possible.