Introduction
I’ve been asked a few times how the performance of WF (Windows Workflow Foundation) Rules compares with that of the Microsoft Business Rules Engine (MS BRE).  Having done no testing, I could only guess at the answer.  A similar subject arose in a different context a few weeks ago when Mark Proctor posted on the subject of sequential rule engines.  Mark is one of the main people behind the open source JBoss Rules engine, formerly known as Drools.  JBoss Rules uses the same generally approach as MS BRE.   You can read his post, and the long list of comments, at http://blog.athico.com/2007/06/what-is-sequential-rule-engine.html.  As you will see, the comment trail soon became a debate about the relative performance merits of different types of rule engine.
The earlier part of the comment trail indicated to me that the various participants each had rather different ideas about what constitutes a ‘sequential’ rule engine.  WF processes rules sequentially.  It creates a collection of ‘Rule States’ internally, each one representing an individual rule in a rule set, and then works through them in a sequential order determined by the order in which the rules appear in the rule set, as modified by the use of priority settings.  Each rule is evaluated, and its ‘then’ or optional ‘else’ actions are ‘fired’ as required.    WF then continues at the next rule unless it detects that the current rule has changed values on which former rules depend.  In this case, it re-evaluates the relevant previous rules as part of a forward chaining cycle.  The forward chaining can be switched off for a given rule set if required.
I’ve now undertaken some initial performance testing to compare WF and MS BRE, and decided to publish the results.   As always, there is danger associated with publishing performance comparisons, and I need to make a few points clear.  First, I do not claim that my tests constitute a comprehensive, sophisticated or real-world investigation of different performance characteristics.  Far from it.  As you will see, I have adopted very simplistic approaches. My tests do not reflect real-world scenarios with any exactitude.  Despite the obvious drawbacks, I think that the approach still has merit.  These types of contrived tests can usefully illuminate the basic performance issues and behaviour of the engines.  However, they tend to exaggerate optimisations and problems alike, and don’t necessarily take all factors that affect performance into account.  As long as the reader remembers that the real world is always murkier and more complex than the simplistic picture I paint here, they should be able to get some useful pointers in regard to the likely performance of the two engines in different scenarios.
Another point I want to make very clear is that my figures compare two specific rules engines, both of which happen to come from the same software company.   For my sins, I live and breathe the Microsoft world, so these two engines are currently highest on my personal radar.   I mainly write blog articles for the BizTalk community, and again, these two engines are widely known in that community.   I strongly caution anyone against trying to extrapolate more general lessons from my results.  As I noted above, it is clear that there are many different models of ‘sequential’ engine.    WF Rules cannot be taken as some sort of ‘reference’ implementation that represents a whole class of ‘sequential’ engines.  I would fully expect other sequential engines to exhibit quite different performance characteristics.  Again, there are many different implementations of set-based Rete engines, each with their own subtle differences.  I do not consider it safe or reasonable to make wide-ranging deductions about entire classes of rule engine from the results I publish here, although I will make some general comments about Rete engines as I go along.
Rationale
Based on anecdotal observation over the last few years, I suggest that, in the specific context of automated business process and workflow, most rule processing involves the evaluation of simple rules in order to make straightforward decisions.  These decisions are typically about data validity, data derivation and the governance of process flow.   I would further suggest that, in this same context, the majority of rule sets are relatively small.  I can offer no formal evidence to back these claims – only the observation that this accords with the experience we have at the company I work for, having undertaken very many process- and workflow-orientated projects on behalf of a wide range of both public and private sector clients with greatly varying requirements.
Taking this as a starting point, I am chiefly interested in exploring the performance characteristics of the two engines when applied to small rule sets containing simple decisioning rules, and, importantly, where the majority of rules tend to follow the same general pattern. Â Â Â This is the most common scenario I have come across in the BizTalk world, and my main concern is to find out which is the better engine to use.
I chiefly design and implement BizTalk Server solutions.  For BizTalk Server developers, it is natural to use the MS BRE which ships with BizTalk Server as a companion product.  There is some lightweight integration between MS BRE and BizTalk Server orchestrations, although I generally favour invoking the MS BRE through my own custom code.   WF Rules can be invoked in a similar fashion without requiring any WF workflow context.  Hence, WF Rules can be used as a ‘stand-alone’ sequential engine supplied as part of the .NET 3.0 platform and invoked from within BizTalk Server.  Which engine is the better choice in terms of performance?
I should state here that any definition of ‘small’ in terms of rule set size is really quite arbitrary.  Different people have a different sense of scale when describing a rule set as ‘small’ or ‘large’, depending on their experience of creating rule sets in real-world scenarios.  For some, a rule set with 50 rules may appear fairly large.  For others, a rule set of 10,000 rules may appear tame in comparison with the requirements they have encountered.
This article is only concerned with assessing the two engines in terms of performance when executing rules.  In the real world, there are many other important considerations to bear in mind when selecting an appropriate rule processing strategy or technology.  These include available skill sets, tooling, expressivity, semantic mapping, support, etc.   I will, however, cheerfully ignore all these other concerns within this article.
Approach
Tests were carried out using WF Rules in .NET 3.0 (WF Rules is implemented as part of the WF libraries which form part of the .NET framework) and MS BRE 3.0.1.0.  This version of MS BRE ships with BizTalk Server 2006, and was written to target .NET 2.0.  Please note that, like everything else, blog articles age with time, and performance characteristics may change in later releases!
All tests have been conducted outside of the context of any orchestration or workflow.  In doing this, I am making an assumption that for both engines, it is entirely possible to invoke multiple instances of executable rule sets on different threads within the same process without incurring unexpected overhead through blocking, etc..  This is certainly the case with the MS BRE which works fine in the context of multiple simultaneous orchestration instances.   I am not aware of any problems with WF Rules in this respect.   Both engines have built-in support for caching rule engine instances, and MS BRE makes extensive use of object pooling.  The built-in WF caching facilities are only invoked within the context of a workflow, and not designed to be used when invoking WF Rules directly, although it is fairly simple to create your own rule engine cache.
For WF Rules, I used the current beta (2.2) of the Microsoft External RuleSet Toolkit.   This toolkit is a first attempt by Microsoft to plug one of the most noticeable gaps in WF Rules – the lack of any rule repository, deployment framework or tooling for WF Rules.  The toolkit provides facilities for storing rule sets as XML documents within a SQL Server database, and contains a WF activity (PolicyFromService) that can be used in place of the Policy activity to invoke specific versions, or the latest version, of a rule set.  The current beta of the toolkit does not yet support policy caching, but this may be included in a future release.
For MS BRE, I used the PolicyTester class to invoke rules.  This is as close as I can get to the same functionality offered by the External RuleSet Toolkit.  The PolicyTester class (which in my opinion is poorly named, as it has many uses beyond testing) is similar to the more familiar Policy class, but extracts rule sets directly from a store using an appropriate rule set provider.  By default, it uses Microsoft’s SQL Server repository provider.  Like the External RuleSet Toolkit, this repository stores versioned rule sets as XML documents.
I have purposely limited the scope of what I am testing in order to investigate only the more fundamental aspects of performance.  Where appropriate, I have used only simple equality tests that compare integers.    I have limited rule actions to the simple task of setting a Boolean value.    I have also used the greater-than operator in order to fire multiple rules for a single value.  I made the point earlier that the tests are not designed to emulate real-world scenarios.   In the real world, the performance of a given engine, executing a given rule set, is influenced by a combination of many factors.  Indeed, there are so many factors that true comparative benchmarking can perhaps only reasonably be undertaken by implementing a wide range of different rule-sets based on real-world examples.   I have not attempted to do this.  Instead, I have tried to reduce the ‘noise’ level to a bare minimum in order to draw a few broad conclusions that I hope will prove to be useful indicators to general performance characteristics.
There are certain assumptions implicit in the approach I have taken.  First, I am assuming that, when using the built-in support for common predicates (e.g., equality tests, relational operators, etc), there is not a great deal of difference in terms of the performance of the built-in code.  I have not undertaken detailed testing for a wide range of built-in predicates.  This is reasonable, given that an inspection of the code shows a very marked degree of similarity in terms of the implementation of predicates.  Indeed, I suspect the WF Rules developers used MS BRE code as a starting point.  As just one trivial example, compare the following two code extracts (via Reflector), both taken from classes named ‘StringLiteral’ derived from base classes named ‘Literal’:
WF Rules:
internal override bool GreaterThanOrEqual(string rhs)
{
   return (0 <= string.Compare(this.m_value, rhs, false, CultureInfo.CurrentCulture));
}
MS BRE:
internal override bool GreaterThanOrEqual(string rhs)
{
   return (0 <= string.Compare(this.m_value, rhs, false, CultureInfo.CurrentCulture));
}
I am also assuming that in the real world, most rule sets created for WF would generally be defined in a near identical fashion in MS BRE.  MS BRE offers a superset of expressivity in comparison to WF.  Anything expressed in WF Rules can be expressed in MS BRE, often in much the same way.  It also reflects that fact that both engines are designed to apply rule processing directly on .NET objects (POCOs – ‘Plain Old CLR Objects’).  MS BRE has additional support, via .NET typed fact wrapper classes, for XML and ADO.NET.
‘First Hit’ and Rule Engine Caching
For each test, I have measured performance in two ways.  The first investigates ‘first hit’ performance.  This is the performance of an engine when it loads and executes a rule set for the first time within a process.  Understanding ‘first hit’ performance is important for scenarios where a rule set is executed just once during a session.  It is less important in scenarios where the same rule set is executed continuously within a single process over the life time of a session.
For ‘first hit’ performance, I measure three distinct activities.  The first is ‘load’ time.  This is the elapsed time taken to retrieve a rule set from a repository and de-serialise it in memory.  The second activity is ‘initialisation’.  This is the elapsed time taken to initialise a rule engine instance with an executable representation of the rule set.  For MS BRE, it is chiefly the time taken to translate a rule set into a node network (Rete).  For WF Rules, initialisation is mainly about analysing rule sets in order to support chaining behaviour.  The third activity is ‘execution’.  This is the time taken to execute the rule set for the first time.
In most BizTalk Server scenarios, and many WF Rule scenarios, the same rule set will be executed many times within the same process.  In this case, the cost of the first hit is not generally a significant concern.  The performance of second and subsequent rule set executions is a far more important consideration.  For both engines, ongoing performance can be maximised by adopting good caching strategies.   WF Rules and MS BRE both provide RuleEngine classes. Rule engine instances are initialised with specific rule sets and execute rules using executors.   If a RuleEngine instance is cached, it can be re-used repeatedly to execute a given rule set, avoiding the cost of re-intialisation.
MS BRE provides built-in rule engine caching within its Policy class.  This is not supported within PolicyTester, but by caching PolicyTester instances, you can effectively cache RuleEngine instances.  Similarly, WF provides RuleEngine caching via its implementation of the Policy activity.  The closest equivalent to PolicyTester is the PolicyFromService activity which is part of the Externalised RuleSet Toolkit.  This does not support caching in the current beta version, and I did not use it because I wasn’t executing rules in the context of a WF workflow.  Hence, for WF, I wrote my own cache code.
A common mistake when using WF Rules outside the context of WF workflows is to cache RuleSets, rather than RuleEngine instances.  The RuleSet class does not contain any functionality for caching RuleEngine instances, although it does have an overloaded, but internal, version of the Execute() method designed to work in conjunction with built-in caching facilities in the WF Policy activity.  Because it is internal, this method cannot be invoked directly in custom code, except via reflection.
Because RuleSet instances do not cache RuleEngine instances, caching RuleSets provides only small performance improvements.  I did test this (though I haven’t recorded by results here), and found that there was some benefit.  However, each time a cached RuleSet is executed, it creates a new RuleEngine instance which then performs pre-processing.  If you are using WF Rules in custom code, do not rely on RuleSet caching to optimise performance.  Instead, you need to create an instance of RuleEngine, and then assign your RuleSet to it.  Cache your RuleEngine and then call Execute() on the RuleEngine rather than the RuleSet.  Here is a very basic cache class I wrote for testing purposes.  It uses the Externalised RuleSet Toolkit to obtain and initialise a RuleSet from an external repository.  Note that it is missing functionality such as exception handling, automatic selection of the latest version a rule set, or clearing the cache:
internal class RuleEngineCache
{
   // Cache of rule engine instances
   private static Dictionary<string, RuleEngine> m_ruleEngineCache = new Dictionary<string, RuleEngine>();
Â
   // Return an instance of a rule engine for a given rule set instance and object type
   public RuleEngine Allocate(string ruleSetName
                            , int ruleSetMajorVersion
                            , int ruleSetMinorVersion
                            , Type thisObjType)
   {
       // Create a unique identifier for the version of the rule set
       string ruleSetVerId = ruleSetName + "_"
                           + ruleSetMajorVersion.ToString()
                           + "." + ruleSetMinorVersion.ToString() + "_["
                           + thisObjType.FullName + "]";
Â
       RuleEngine ruleEngine;
Â
       if (!m_ruleEngineCache.TryGetValue(ruleSetVerId, out ruleEngine))
       {
           // Get RuleSet from repository using the Externalised RuleSet Toolkit (ERST)
           RuleSet ruleSet = new RuleSetService().GetRuleSet(
                                                    new RuleSetInfo(ruleSetName
                                                  , ruleSetMajorVersion
                                                  , ruleSetMinorVersion));
           if (null == ruleSet)
           {
               return null;
           }
Â
           // Create and cache a new RuleEngine over the rule set
           ruleEngine = new RuleEngine(ruleSet, new RuleValidation(thisObjType, null));
           m_ruleEngineCache.Add(ruleSetVerId, ruleEngine);
       }
Â
       return ruleEngine;
   }
}
Here is code that executes the rule set using the cache:
// Get the cache
RuleEngineCache rEngCache = new RuleEngineCache();
Â
// Execute the rule set over an object
RuleEngine rEng = rEngCache.Allocate("MyRuleSet", 1, 0, typeof(MyTestClass));
rEng.Execute(myTestObject);
As I stated previously, I used the PolicyTester class for MS BRE, and cached a PolicyTester instance in order to investigate the performance of the engine when using RuleEngine caching.    PolicyTester is instantiated over a RuleSet, and uses this to initialise a RuleEngine instance which it retains across multiple executions.   When you execute the PolicyTester, it calls Execute() on the RuleEngine object, asserting a list of ‘fact’ objects, executing the Rete network and then retracting the facts, leaving the initialised RuleEngine ready for the next execution.
When conducting tests with rule engine caching, I was careful to execute the rule sets a few times before starting my measurements.  This is important in order to eliminate any ‘first hit’ overhead.   Tests with caching are designed to indicate the performance characteristics of fully ‘warmed-up’ rule engine instances.
Load and Initialisation times for ‘First Hit’ tests
The ‘first hit’ tests tell us something useful about rule engine performance when used in certain scenarios. For example, you might want to exploit WF Rules in a desktop application at start-up.  The ‘first-hit’ performance becomes a component of application start-up performance, and could potentially have an impact on user experience.
It makes little sense to measure ‘first hit’ performance only in terms of execution time.  We also need to consider load and initialisation times.  In most scenarios, all three activities will be done together, and the user will not be aware of the distinction.  Load time, however, is problematic.  First, we must decide where to load rule sets from.  Both engines represent rule sets using XML, and provide mechanisms for de-serialising XML into runtime RuleSet object graphs.   In a WF context, rule sets are normally compiled into a workflow assembly as managed resources and loaded via reflection.  MS BRE does not provide equivalent functionality out of the box, but it is a trivial matter to create a RuleStoreProvider class to retrieve rule sets from managed resources. Â
The MS BRE Policy class works with the local Rule Engine Update Service to retrieve a rule set from an external repository.  The Rule Engine Update Service is a Windows service that supports a pub/sub mechanism.  Repositories can publish notifications each time a new version of rule set is deployed, and local instance of the service can subscribe to these.  The Rule Engine Update Service is generally used with the out-of-the-box SQL Server repository, and the subscription component supplied by Microsoft actually polls the database.  When a Policy needs a rule set, it makes a request to the Rule Engine Update Service which, if necessary, uses cached notification information to download the required rule set version from a given repository.  The rule set is cached by the service, so future requests from Policy instances do not result in a round trip to SQL Server.  Furthermore, the Policy uses an internal cache class to cache initialised RuleEngine instances.  This is done at process level using a static collection.
Instead of Policy, I used the PolicyTester class which directly downloads rule sets from a repository without making requests to the Rule Engine Update Service.  You wouldn’t usually use this class in a BizTalk Server production environment.  In using PolicyTester, I am making the assumption (not validated) that load times are fairy similar to the load times incurred when using the Rule Engine Update Service to load a rule set from the repository for the first time.  This is not unreasonable assumption since in both cases the loading is done using the same provider functionality.
For WF, I have used beta 2.2 of the Externalised RuleSet Toolkit.  This seems reasonable, given that I am interesting in exploring the relative performance merits of each engine from the perspective of a BizTalk developer.  Using ERST would be the natural choice when exploiting WF Rules in conjunction with BizTalk Server.
// Load
hpt.Start();
RuleSetServicersService = new RuleSetService();
RuleSet ruleSet = rsService.GetRuleSet(new RuleSetInfo("RuleSetName ", ver, 0));
hpt.Stop();
Â
sw.Write("Load time, " + hpt.Duration.ToString() + ", s");
Â
// Initialise
hpt.Start();
RuleEngine ruleEngine = new RuleEngine(ruleSet, new RuleValidation(typeof(TestValue), null));
hpt.Stop();
Â
sw.Write("Init time, " + hpt.Duration.ToString() + ", s");
Â
// Execute
hpt.Start();
ruleEngine.Execute(testObj);
hpt.Stop();
Â
sw.WriteLine("Run time, " + hpt.Duration.ToString() + ", s");
Here is the equivalent code used for MS BRE.
// Load
hpt.Start();
SqlRuleStore sqlRuleStore = new SqlRuleStore(connectionString);
RuleSet ruleSet = sqlRuleStore.GetRuleSet(new RuleSetInfo("RuleSetName", ver, 0));
hpt.Stop();
Â
sw.Write("Load time, " + hpt.Duration.ToString() + ", s");
Â
// Initialise
hpt.Start();
PolicyTester pt = new PolicyTester(ruleSet);
hpt.Stop();
Â
sw.Write("Init time, " + hpt.Duration.ToString() + ", s");
Â
// Execute
hpt.Start();
pt.Execute(facts);
hpt.Stop();
sw.Write("Run time, " + hpt.Duration.ToString() + ", s");
For WF Rules, you execute rule sets over a single object.  For MS BRE, you assert ‘fact’ objects to the engine before execution.    You can assert many facts to the engine in order to represent sets of data.  Most of the tests compare near identical rule sets on both engines, and hence I generally asserted a single fact to MS BRE.  As mentioned before, MS BRE performs rule processing directly over POCOs and also over ‘typed’ wrapper classes representing XML, ADO.NET tables and rows or live database connections.  WF Rules does not provide support for these additional data representation types, and hence I have only used POCOs in MS BRE.  It is important to note that MS BRE performance is likely to be different when using ‘typed’ wrappers.
I performed all tests on an HP nw9440 2.33 GHz dual core notebook computer with 4GB RAM.  Load time is largely I/O-bound, rather than CPU bound, and the use of a notebook computer is a poor choice for measuring I/O-bound performance.  My results for load time should therefore be treated with considerable care when considering likely performance in real-world scenarios.
WF Rules Chaining Behaviour
WF Rules supports forward chaining.  When a rule is evaluated, and its actions are executed, those actions may change data used during the evaluation of previous rules.  In this case, the engine can detect which previously evaluated rules are affected and loop back to re-evaluate those rules, firing their actions as required a second time.  This mechanism allows the design of more sophisticated rule patters involving dependencies between different rules.
When a WF RuleEngine object is instantiated, it performs pre-processing of the rule set in its constructor.  The first part of pre-processing involves creation of a sorted list of RuleState objects.  Each RuleState is created over an individual rule.  The pre-processor then enters an analysis phase.  However, analysis is only undertaken if chaining behaviour is set to ‘forward chaining’ or ‘update only’.  ‘Update only’ is a special version of forward chaining is which chaining will only occur if the Update action is used explicitly.   The engine generally performs more analysis work for ‘forward chaining’ mode than for ‘update only’ mode.  This analysis is important in order to initialise the engine so that at run-time it can detect which previously evaluated rules are affected by later changes to the data.  WF Rules refers to this as ‘side effects’, but uses this term quite differently to MS BRE.  In MS BRE, side effects are controlled at the level of data facts (via caching of data values), rather than rules, and do not automatically lead to chaining.
The additional pre-processing work associated with forward chaining behaviour is an important consideration when measuring ‘first hit’ performance.  Hence, tests were conducted using different chaining behaviours.  I also tested performance of cached rule engines using different chaining settings to find out what effect, if any, this had on repeated execution.
MS BRE Side Effects
MS BRE regards the development of rule sets as a form of functional (and therefore declarative) programming.  Theoretically, a ‘pure’ function should execute without causing ‘side effects’ within other functions.  However, there is an impedance mismatch between this ideal and the imperative nature of object-orientated .NET code.  This is a problem when executing rules directly over POCOs.  MS BRE supports a ‘side effects’ flag that indicates if side effects should be allowed when using .NET methods or properties as predicates or arguments in rule conditions.
The ‘side effects’ flag is used to control internal value caching within the engine.  It is set on individual predicates and arguments within a rule, and therefore offers fine-grained control over caching.   When set to true, the engine switches off the cache, and may repeatedly invoke the .NET member.  When ‘false’, the .NET member will be invoked once per rule set invocation, and its value will be cached.  By default, MS BRE switches caching off when using POCOs and on when using typed wrappers.
The use of the side effects flag can have a significant effect on performance.  The engine will generally perform better when caching is used.   Unfortunately, although the side effects flag is documented, it is not supported within the Microsoft Rule Composer UI, and can only be set by editing the rule set XML (Business Rule Language) directly.  When using the SQL repository, this involves exporting the rule set to file, editing the XML and then importing the amended version.
I tested MS BRE for both settings of the side effects flag in order to compare the performance differences.
Test 1
Test 1 is as simplistic as you could imagine in terms of its design.  Each rule in the rule set has the following form:
IF
   TestObject.Value == <literal integer value>
THEN
   TestObject.Valid = true
All rules have equal priority.  I wrote code to generate rule sets, and incremented the value of the literal integer by one for each rule, starting at 1.  I generated rule sets of varying sizes in the range of 1 – 10,000 rules.  Even though I am chiefly interested in investigating the performance of the two engines when applied to small rule sets, it was useful to test larger rule sets in order to get a better picture of scalability characteristics.
As I mentioned above, WF Rules has no concept of individual ‘facts’.  You execute a rule set over a single object, referred to as the ‘this’ object.  In the context of a WF workflow, this is generally the WF workflow itself, but in Test 1, I created a single instance of a custom class.  I took a similar approach in MS BRE, asserting a single fact to the engine.  The Value property was set to 1.  By not using Halt in WF, I ensured that WF evaluated the object against each rule in turn.  I will discuss MS BRE’s behaviour in this respect after we have looked at the results.  The point here is that the test object matches just one single rule.  The resulting action is kept very simple.  The test is designed to investigate scaling characteristics of the two engines in relation only to the size of the rule set.
Test 1: ‘First Hit’ results
The numeric results are published in the appendix at the end of this document.  Here are the results in graphic form.   The first graph shows the overall results.  The second graph shows the results for smaller rule sets only.  Each graph shows the combined time for execution, initialisation and load. Â
Â
We can see from these results that WF Rules scales quite differently to MS BRE, and that the use of forward chaining imposes a considerable additional cost for large rule sets, although it is of little consideration for smaller rule sets.  In MS BRE, the side effects flag is not a significant consideration for ‘first hit’ performance.
The second graph shows that WF Rules out-performed MS BRE on the first hit for small rule sets.  This is a significant finding, given that I am most interested in exploring the behaviour of these two engines when applied to small rule sets.  A more detailed investigation of the results shows that WF Rules outperforms MS BRE for all three measures (load, initialise and execute).  The most critical factor here (not surprisingly) is load time.  However, as the rule set size increases, the situation is reversed on all three measures.
Test 1: Cached Rule Engine results
Here are the results obtained when using rule engine caching. Again, the first graph shows overall results and the second graph focuses on smaller rule sets.  In this case, we measure execution time only, and only begin measurement after ‘warming’ the engine up by executing the rule sets a few times.

Not surprisingly, the results show a huge performance increase across the board.  For WF Rules, we can see that the chaining behaviour makes no significant difference, even for large rule sets.  The additional cost of using forward chaining appears to be confined to ‘first hit’ scenarios.
When using rule engine caching, MS BRE outperformed WF Rules for small, as well as large rule sets.  It also displayed significantly better scaling characteristics.  Both engines display near-linear scaling characteristics across the rule set size range.  For MS BRE, there was a very significant improvement in performance when side effects were avoided (i.e., caching was switched on).
MS BRE Optimisation Issues
The higher performance of MS BRE when using rule engine caching is a natural consequence of the use of the Rete algorithm.  In this test, the single asserted ‘fact’ passes through a discrimination network of in-memory nodes.  The object passes from a root node to a ‘class node’ which tests its type (rather redundantly in this case).  The class node passes the object on to a set of descendant ‘select’ nodes.   In this test, the number of ‘select’ nodes is equal to the number of rules.  However, this is co-incidental due to each rule having a single unique conditional test.  Each select node does not represent a rule, per se.
The important point here is that each select node has only the single concern.  It evaluates the fact according to a given condition.  If the fact matches the condition, the node passes the fact on to a subsequent node.  In this case, the subsequent node is a ‘terminal’ node.  Terminal nodes represent individual rules.  Our fact will only match a single condition.  Hence, it is passed to one terminal node only.   The engine does not activate any of the remaining terminal nodes.
The Rete algorithm helps to reduce the work undertaken by the engine during rule matching to a reasonable minimum.   Even so, there is some overhead.   In Test 1, for 10,000 rules, there are 10,000 evaluations and one rule ‘firing’.  On my machine, I can execute simple custom code that will evaluate an integer 10,000 times on a single thread in about 0.01 ms.  It takes about 10 ms to evaluate a public integer property 10,000 times using reflection. MS BRE introduces additional overhead as it iterates over node sets, but this is fairly minimal.  The overhead for WF Rules is also small, but significantly higher than for MS BRE.
Any readers who are Rete experts may have realised that the current version of MS BRE does not support an optimisation technique used in some modern engines.  It is possible to increase the performance of this part of the node network further by implementing hash-based propagation from ‘class’ nodes to ‘select’ nodes that evaluate literal values.  For any given fact type, select nodes can register their associated attribute/literal value pairs with the relevant class node.  This allows the class node to inspect the relevant attributes and propagate facts only to those nodes designed to match a given literal value.  This would have reduced the match effort, regardless of the rule set size, to a single evaluation in the class node together with a hash table look-up which would return a single select node.  The resulting execution graph line would exhibit O(1) time complexity .  In English, this means that engines with this optimisation will always take a constant amount of time to complete Test 1, regardless of the size of the rule set.  Of course, for really large rule sets, other factors would eventually set in, such as high memory usage.   One way to think about this optimisation is that it is rather like performing an indexed lookup in a relational database table, although it is rule conditions that are ‘indexed’, rather than the data asserted to the engine.
To demonstrate this optimisation, here are the results for Test1 with the equivalent of RuleEngine caching for a well-know Java open source Rete rules engine (JBoss Rules 4.0.0) which implements this technique.

Please note that this does not constitute a comparative test between MS BRE and JBoss Rules.   My purpose here is simply to demonstrate the optimisation I have described.
In many real-world scenarios, I would expect the decay in performance in MS BRE for larger rule sets to be more pronounced due to the overhead of different combinations of data types and predicates.  Test 1 shows the basic performance characteristics of MS BRE in the best possible light.
Test 2
Having explored the performance of the two engines when applied to the simplest of rule sets, we will now look at how this performance is affected by various changes.  In Test 2, the original test is modified to add additional conditions to each rule.  Here is the rule pattern:
IF
   AND
       TestObject.Value1 == <literal integer value>
       TestObject.Value2 == <literal integer value>
       TestObject.Value3 == <literal integer value>
THEN
   TestObject.Valid = true
As for test 1, a generator was used to generate rule sets between 1 and 10,000 rules in size.   The same literal value is used in each of the three conditions in any one rule, and this value is set to 1 for the first rule and then incremented by one for each subsequent rule.  The properties of the test object were each initialised to 1 so that the test object matched just one rule.  The test was designed to be identical in all ways to Test 1 except that each rule has three conditions, rather than 1.
Test 2: ‘First Hit’ results
Â

Not surprisingly, the results display similar characteristics to test 1, but overall performance is slower.  One reason is that because each rule now contains three conditions, the rule set is correspondingly larger, and takes longer to download and process.   The effect is worse for WF Rules than for MS BRE, and consequently, MS BRE starts out-performing WF Rules at slightly smaller rule set sizes.
Test 2: Cached Rule Engine results
Â

These results are virtually identical to those obtained for Test1.  The reason is that each rule represents the conjunction (‘ANDing’) of three conditions.  If the first condition evaluates to false (which it does for all but one rule), both engines effectively short-circuit the evaluation of the other two conditions.  Hence, the amount of evaluation work done by each engine is virtually identical to that for Test 1.
For Rete engines, the short-circuiting of conjoined conditions is a natural consequence of the use of a node-based discrimination network.  If a fact or token is evaluated by a node, and the result is ‘false’, the node does not pass the fact or token on to the next node for evaluation.  In the case of Test 2, MS BRE further optimises this by assigning the multiple conditions in a rule to a single select node.  Each select node ‘short-circuits’ internally.
This short-circuiting in MS BRE highlights one of the fundamental problems faced by developers when using set-based ‘production’ systems (the term ‘production’ refers here to the kind of rule processed by MS BRE).  In a previous article (http://geekswithblogs.net/cyoung/articles/79500.aspx), I wrote about the lack of short-circuiting when using disjunctions (‘ORing’).  Most developers are more used to mainstream imperative, procedural programming languages, rather than declarative, set-based pattern matching systems.  They often expect short-circuiting to be used for both conjunctions and disjunctions.  This is the case in WF Rules.  However, MS BRE does not obey this familiar pattern.  Sometimes it short-circuits.  Sometimes it doesn’t.  This behaviour may appear to be inconsistent.  However, it is really a logical consequence of applying rule processing to data sets.   The issue is obscure, and the behaviour of the engine is opaque, making it harder for rule developers to identify, analyse and fix problems.
Test 3
Test 3 is similar to the previous two tests.  It explores performance when all rules match.    Like Test 2, each rule contains three conjoined conditions.     Here is the rule pattern:
IF
   AND
       TestObject.Value1 > <literal integer value>
       TestObject.Value2 > <literal integer value>
       TestObject.Value3 > <literal integer value>
THEN
   TestObject.Valid = true
Rule sets were generated between 1 and 10,000 rules in size.   The same literal value is used in each of the three conditions in any one rule, and this value is set to 1 for the first rule and then incremented by one for each subsequent rule.  The properties of the test object were each initialised to 10001 so that the test object matched every rule.
Test 3: ‘First Hit’ results
Â

These results are virtually identical to those obtained for Test 2. Â