A few days ago, Daniel Selman of ILOG (now owned by IBM) published a solution to the Einstein puzzle. See http://blogs.ilog.com/brms/2009/02/16/einsteins-puzzle/. He did this in response to a challenge from James Owen who is one of the organisers of the October Rules Fest conference. James invited the various vendors who have involvement in the conference to provide a solution using whatever approach they deemed best. See http://orf2009.blogspot.com/2009/02/puzzle-1-for-conference.html.

The Einstein puzzle, also known as the Zebra puzzle, consists of 15 rules expressed as constraints. The puzzle is generally attributed to Albert Einstein, but there appears to be little evidence of his authorship. There are several variations, but they typically involve five people of different nationalities who live in five adjacent houses of different colours. Each person owns a pet animal, has a favourite drink and smokes a brand of cigarette (how quaint that sounds in these tobacco-intolerant times). The term 'Zebra' comes from one variation which asks "who owns the Zebra". However, the version James Owen pointed us to (from Stanford University; see http://www.stanford.edu/~laurik/fsmbook/examples/Einstein%27sPuzzle.html) asks about ownership of fish - they are owned by the German house owner, incidentally. Einstein was German, so maybe this is subtle evidence of his authorship. The Wikipedia article at http://en.wikipedia.org/wiki/Zebra_Puzzle states that "the puzzle becomes a non-trivial challenge to inferential logic". Very true! The suggestion is that most people (98% according to Stanford) cannot work out the answer to the puzzle. Personally, I didn't even try.

Daniel explains that the puzzle represents a problem in constraint satisfaction (a 'constraint satisfaction problem', or CSP). Accordingly, although ILOG is perhaps better known for their Rete rules engine, he and his colleagues eschewed this in favour of their constraint programming (CP) tools. The result is a straightforward rendering of the puzzle in a constraint language, and a result that shows their Constraint Satisfaction (CS) engine determined the correct answer in about 20ms. Daniel suggests that, if they had used the rules engine ("the mighty Golden Hammer"), they would have had to implement 'convoluted rules' to solve the problem. Daniel is correct, but I thought it would be interesting to look a little deeper at why this is the case.

Constraint satisfaction is an approach in which rules are applied to a set of variables (placeholders that can contain one of several values) of given domains (the valid sets of values that a particular variable can have). Each rule is a constraint which identifies valid values for a combination of variables. For example, in the Einstein Puzzle, we have a constraint that states:

The Englishman lives in the red house

There are two variables involved here. The 'nationality' variable holds a value from a domain containing five nationalities (speaking as an Englishman, I consider my nationality to be British, but maybe things were different in Einstein's day). The 'colour' variable holds a value from a domain of five possible colours. The constraint says that the combination of [nationality = 'Englishman'] and [colour = 'red'] is a valid partial solution to the problem.

To get a better feel for what a CSP is, let’s consider something we are all familiar; namely a jigsaw puzzle. We can consider this to be a CSP containing a single variable type (jigsaw piece) for the single domain of all actual jigsaw pieces. There are numerous constraints represented by the shapes of individual sides of pairs of jigsaw pieces. In order to solve the jigsaw puzzle, we could adopt a number of approaches. The least efficient would be to methodically generate every possible combination of each of the jigsaw pieces. For each combination, we could then step back and inspect it to see if it is correct. This approach is sometimes called 'generate and test'.

Given a 1,000 piece jigsaw where each piece has four sides, the number of combinations is unimaginably large! Depending on how you calculate it (e.g., what combinations of sides with a straight edge are deemed 'valid' candidates), and remembering that each piece can be orientated in one of four positions, it could be as high as 4.62^{3169} Given that scientists estimate only 10^{80} atoms in the entire observable universe, I think we can safely say that this is not a good strategy to adopt!

In another approach you might blindfold yourself (so you can't see the pictures) and then adopt an algorithmic approach in which you pick up a piece at random and compare one of its sides to each side of each other piece until you detect a perfect fit (all by touch!). This represents a partial solution to the problem. You would then select an unsolved side of the second jigsaw piece and repeat the test until you found another fit. You would then continue until you selected a straight edge. At that point, you would backtrack and select another unmatched edge for the current piece (or the previous piece if all edges are satisfied), and continue.

Eventually, through careful application of this backtracking algorithm, you should be able to complete the entire puzzle. Solving our 1,000 piece jigsaw with, say, 136 edge pieces, would require an average of about 911,000 separate comparisons. If each comparison takes an average of 5 seconds to complete, and you work 8 hours a day, you could reasonably expect to complete the jigsaw in about a month. This is a good deal better than the first strategy, but still hard going.

When solving jigsaw puzzles, we generally keep our eyes open! We supplement basic algorithms with a number of additional techniques, such as first solving the relationships between all the pieces that contain a straight edge. When trying different candidate solutions, we reduce the problem space by only comparing sides with a 'tab' against sides with a 'blank'. We also use our extraordinary human talent for image processing to infer likely solutions by studying the picture superimposed onto the various pieces (i.e., comparing only those pieces which show sky or grass). In this way, we collapse the problem space down to a far more manageable level, allowing us to complete the puzzle in, perhaps, just a few hours.

These approaches have analogues in the algorithms and strategies employed by constraint programming engines when solving CSPs. These engines typically implement advanced forms of backtracking, forward checking, constraint propagation and heuristics, reflecting similar processes to those we employ in solving jigsaw puzzles. Think of each jigsaw piece as a node in the solution tree, and you are beginning to understand what CP is all about. This, then, is the point. A CS engine focuses on problem space reduction for CSPs.

Jigsaw puzzles are great fun, I'm sure. However, CSPs occur in the world of business as well as leisure. Text-book examples include various types of scheduling, timetabling and assignment problems in which we must find possible solutions given a large number of variables. They often concern finding the most optimal of a range of possible solutions. By collapsing the problem space as far as possible, they make highly combinatorial problems more manageable, and even support finding 'good enough' approximations to solutions of NP-Complete problems.

Let us now consider the world of Rete rules engines. These engines express rules as productions. A production contains two parts. The left-hand side (LHS) contains a number of condition expressions which are used to match entire data tuples, rather than individual variables. These are data tuples asserted to the engine at runtime and called 'facts'. The right-hand side (RHS) specifies a set of actions to be undertaken for each match found by the engine. These actions can perform any task, including making changes to the data being matched by the engine, resulting in the need to re-evaluate rule conditions against the data. This is handled by a forward chaining mechanism.

Production rules are flexible and can be used to model a wide variety of conditional computations. The rule actions are performed over the data tuples matched by the rule conditions. One of the most important features of Rete engines is that the LHS of the rules may match patterns across multiple tuples. In effect, the engine can perform joins over tuples. This is very expressive, but greatly complicates matters when performing forward chaining. Because joins are relatively expensive to undertake, the cost of revaluating all the rule conditions each time data changes could quickly become prohibitively expensive. One major strength of the Rete algorithm is that it is able to reduce the amount of re-evaluation work by only re-evaluation rules for the changed data (the 'deltas').

It could be tempting to think of constraint satisfaction and production rule processing as direct competitors. However, this is not the case. Rete engines are general purpose devices which are good at handling combinatorial data processing in a performant manner, albeit by trading memory for performance by storing partial join results in memory. However, there are limits to what can realistically be achieved. If you swamp a Rete engine with astronomically high numbers of combinations, it will still not be able to cope, however efficiently it undertakes this work. Constraint Satisfaction engines, by contrast, are focussed specifically on addressing CSPs. These problems are generally highly combinatorial, and CS engines concentrate on reducing the problem space as far as possible before any combinatorial work is undertaken.

It is obvious that the two approaches are complimentary and should be used together as appropriate. However, I need to address a slightly deeper issue. A production rule is a profoundly simple yet flexible construct in which the conditional part yields data on which the action part operates. As such, you can consider a production rule to be a basic unit of computation. A CS engine is characterised by its ability to apply a range of reductive and heuristic algorithms to sets of variables within given domains, driven by constraints. By contrast, a Rete engine is characterised in terms of being able to perform pattern matching-based computation efficiently. CS engines are 'specialised', whereas rule engines have a general application. This observation has an interesting implication. It is possible to implement virtually any algorithmic approach using nothing more than production rules. So, for example, we could imagine building an entire CS engine using only a Rete rules engine. Conversely, you could **not** imagine building a Rete engine using only a CP language.

This, then, lies at the heart of Daniel's assertion that the Einstein problem could only be solved by creating 'convoluted' rules. Strictly, you could argue that Daniel’s statement is not quite accurate. It would be possible to implement a correct but highly naive approach in which you generate all possible combinations of all the variables and then apply rules to retract combinations until the solution is found. In this case, the rules would directly reflect the constraints as stated in the puzzle, together with a handful of additional rules that create all possible combinations, detect the solution, etc. Such a rule set would be only slightly more convoluted than the expression of constraints in a CP language. However, it would also be completely unworkable. As for the first strategy I outlined above for jigsaw puzzles, the number of possible combinations of variables in the Einstein Puzzle rapidly becomes unmanageable, regardless of the efficiency of the engine used to processes them.

The only practical solution would be to implement backtracking and other types of algorithm using production rules. This would be hard, and would lead inexorably to the kind of 'convolution' Daniel envisages. Indeed, this has been done! See http://www.droit.univ-paris5.fr/futtersack/english/research/CCP/index.html for an example of a CP approach using nothing more than the CLIPS Rete engine and its associated language features. It happens that the 'convolution' is mitigated in this example by careful modularisation, allowing the rules developer to express the constraints fairly naturally, and to use pre-written modules to apply backtracking and forward checking algorithms. So, it is quite possible to use the 'mighty Golden Hammer' of Rete engines to undertake CP in a reasonably efficient manner. Note, however, the authors' comment that " ... performance ... cannot be compared with the specialized CSPs [sic] algorithms in terms of time efficiency". I cannot in any way recommend this as a sensible strategy to adopt in real-world computing! Use a proper CS engine instead.

So, there we have it. The 'mighty Golden Hammer' of Rete will quickly look very tarnished indeed when applied inappropriately. Use a CS engine when you need to find possible and/or optimal solutions based on combinations of a large numbers of variables. The focus of CS engines is to use constraint statements to collapse the problem space in order to minimise the amount of combinatorial work required, and to employ advanced algorithms for finding CSP solutions as efficiently as possible. Ultimately, a CS engine produces tuples representing solutions and containing combinations of variable values.

Use a Rete engine when the focus is on processing tuples in order to perform actions. Actions are performed in the context of each tuple that matches a set of conditions, and those conditions can match bound variables across multiple tuple types (a fancy way of saying that they can represent joins). Rete engines are good at matching complex data sets and finding every occurrence of a given pattern. They make no attempt to collapse combinatorial problem spaces in an equivalent fashion to a CP engine, but do provide an efficient mechanism for undertaking combinatorial processing in situations where it is unavoidable, and specifically where actions can produce side effects that require re-evaluation of joins.

To summarise: CP engines analyse possibilities and produce facts. Rete engines analyse facts and perform actions, potentially creating new or higher-order facts as they go. Use CP and Rete engines together wherever appropriate.

UPDATE: For the first couple of hours after I posted this article, it finished with a complaint about the lack of free/open-source solvers for .NET. Yes, there are a couple out there, but not all are for commercial use, or in a very complete state. Not for the first time, however, I discover I have been caught out by the latest developments from Microsoft. In recent weeks, they have published their own solver library called the Microsoft Solver Foundation. The foundation provides a common programmatic interface and modelling runtime together with a plug-in architecture for third party solvers. The idea seems to be to provide a common infrastructure for CP on the Microsoft platform that can support a wide variety of third-party solver engines.

The 'Express' edition is available for free and can be downloaded from http://code.msdn.microsoft.com/solverfoundation/Release/ProjectReleases.aspx?ReleaseId=1799. The latest version (1.1) was posted just days ago. The Express edition ships with a version of the Gurobi MIP solver although as I understand it from a fellow ORF contributor Erwin Kavelagen (see http://yetanothermathprogrammingconsultant.blogspot.com/2009/02/ms-solver-foundation-gurobi-2.html) it appears that the solver has reduced capabilities compared with the full version. There are also a couple of built-in solvers including a CSP solver.

I have downloaded the code and will see if I can put together a solution for the Einstein puzzle.

UPDATE 2: I missed a major point when writing this article, and hence I need to qualify certain statements. Broadly, speaking, what I said is correct, and I will let the original text of the post stand. What I had failed to take into account, however, is the rather obvious point that Rete networks effectively implement backtracking algorithms as a natural consequence of the way they work. I was aware of this, and it did occur to me that maybe I should consider this further, but instead, I concentrated on discussing the (very bad) idea of superimposing your own backtracking algorithm onto the engine, effectively writing such an algorithm in production rules.

With a little thought and effort, it is entirely possible to implement a solution for the Einstein puzzle in a fairly natural and surprisingly efficient manner. Indeed, the puzzle was solved years ago in CLIPS, and the example is included with rules engine which you can download from http://sourceforge.net/project/showfiles.php?group_id=215471&package_id=260062. Not only was it solved, but the rules are really not very complex at all, and mirror the constraint statements directly.

One caveat is that all the constraints appear in one single rule. This is crucial to simplifying the problem because it allows the single rule to exploit the Rete network in order to massively reduce the combinatorial processing. It does this by the simple artifice of first joining each of the 25 decision facts initially asserted to the engine on each of the 5 house positions in order to create 125 attribute-value facts. These facts are then processed by the single rule with contains additional conditions, not directly stated within the original constraint expressions. These conditions are quite straightforward, and ensure that each value chosen in a decision can only appear for any one house within any one match for the single rule. So, having initially asserted 25 decision facts, the engine needs only produce a small number of attribute-value facts by a simple piece of combinatorial processing. After that, the single 'constraint' rule does all the heavy lifting, producing a Rete network that exploits its internal backtracking capabilities to the maximum.

The example is actually a 'Zebra' variant. I created an Einstein variant and ran this on my notebook under CLIPS 6.3. It took 59ms to find the solution. Compare this with the Microsoft Solver Framework example which I blogged on at http://geekswithblogs.net/cyoung/archive/2009/02/25/129672.aspx, and you can see the Rete engine actually out-performed Microsoft's CSP solver which took 169ms to obtain an equivalent result and complete its work (in fact, this does not include printing the results, which is included in the CLIPS version)! CLIPS 6.3 offers very high performance, and I suspect that many other engines would be a little slower.

Defining all the constraints in a single rule is certainly more convoluted than defining constraints for a CS engine, and requires a deeper engagement from the developer. Also, for rules engines like MS BRE that employ higher-level rule languages, things can get quite unwieldy. CLIPS benefits here (as do a number of other engines) by employing a lower-level approach to defining rules in which variable substitution is directly expressed across conditions. The terse syntax employed by CLIPS helps, in this case, to keep the intent of each set of conditions within the rule fairly clear. Certainly, the rule set is not convoluted in the way I was thinking of when writing the article.

I still fully endorse the central point Daniel Selman made. A good CS engine is preferable to a Rete engine to solving CSPs, and I wouldn't advocate using Rete engines for this purpose unless you really have to. The expression of constraints using a CP solver is more concise, simpler to understand and easier to maintain. In addition, the Einstein puzzle does not reflect real-world requirements, and I suspect that a point rapidly comes where the more advanced algorithmic features of modern CS engines become crucial to handling the complexities inherent in the problems people are solving.

UPDATE 2b: OK, I couldn't resist it. I ported the amended CLIPS test to MS BRE just to see what the performance would be like. It originally took 126ms on my box. I then amended the rule set to use caching (sideeffects="false") and the elapsed time reduced to 98ms So, even MS BRE was able to solve this puzzle significantly faster as the MS CSP Solver.

I did a little more investigation. It is very important to understand that each run I did was a 'first hit'. In Rete engines, the first hit is generally quite expensive because of the time taken to create the Rete network (equivalent to the MSF model), cache contextual data, etc. If you re-use a cached Rete over several iterations of the test, the time reduces significantly. In MS BRE, subsequent runs take 20ms. In CLIPS, the time reduces to 10ms.

I re-visited the MS CSP Solver implementation of the puzzle. To begin with, I instrumented the code in the same way I instrumented it for MS BRE. The total time to run the test on a first hit was in fact 330ms. I suspect that the total time reported by the solver is not inclusive of the time taken initialise the solver context and create the model. I then investigated what happens if you cache the model and decisions and re-use them over several iterations. In this case, the time reported by the solver reduces to about 1ms. However, the instrumentation in the code reports 5.5ms. Like the Rete tests, this includes time to collate and print out the results. I further instrumented the code, and sure enough, it turned out that while the elapsed time for each re-execution of the "context.Solve(new ConstraintProgrammingDirective())" call was 1 ms, the other 4.5ms involved all the sorting and printing of the results.

In the MS BRE version of the puzzle, the results are collated and printed using a single rule. I de-activated this rule and re-ran the test, and also removed some additional PrintLine actions. Second and subsequent execution time reduced to 11 ms. I did the same with the CLIPS version of the test and execution time reduced likewise to 5ms.

It appears, on the basis of this highly simplistic and questionable microbenchmarking, that the MS CSP Solver provides poor performance on a first hit, but much better performance on subsequent execution. It is not really possible, on the strength of this testing alone, to say definitively how the core performance of the MS CSP Solver compares to that of the Rete engines because, in the case of the Rete engines, we clear the working memory and assert facts afresh to the engine on each run, whereas for the CSP Solver the decisions are added to the model once, and then cached. Decision names are used as the equivalent of asserted 'attribute' values in the Rete versions of the puzzle. A more direct comparison would probably involve using the parameterisation features of the CSP Solver. After all, in real-world scenarios, what would be the point of executing the solver more than once unless we could get different answers from it on each execution.

One last point. For my MS BRE and MS CSP Solver testing, I used the QueryPerformanceCounter and QueryPerformanceFrequency APIs in order to get a high resolution measurement of elapsed time. See http://www.codeproject.com/KB/cs/highperformancetimercshar.aspx for further details.

Posted on Tuesday, February 24, 2009 1:35 PM | Back to top

Your comment:
Title:
Name:
Comment: *Allowed tags: blockquote, a, strong, em, p, u, strike, super, sub, code*
Verification:
var RecaptchaOptions = {
theme : 'white',
tabindex : 0
};

http://code.msdn.microsoft.com/solverfoundation

Download it

In there you will find a C# sample to does the Zebra problem.

Suggest you try and come up with a problem that has the Solver Foundation working with Workflow Rules.

I discovered MSF a short while after posting! So, see the update, and also another article in which I started with the sample you mentioned and implemented an amended version which is closer to Daniel Selman's version. See http://geekswithblogs.net/cyoung/archive/2009/02/25/129672.aspx.

I was thinking along the same lines as you suggested - an example that combines a CSP with rules. If I come up with a good example, I'll maybe put somethign together.