WF Rules and MS BRE – Comparing Performance

IntroductionI’ve been asked a few times how the performance of WF (Windows Workflow Foundation) Rules compares with that of the Microsoft Business Rules Engine (MS BRE). Having done no testing, I could only guess at the answer. A similar subject arose in a different context a few weeks ago when Mark Proctor posted on the subject of sequential rule engines. Mark is one of the main people behind the open source JBoss Rules engine, formerly known as Drools. JBoss Rules uses the same generally approach as MS BRE. You can read his post, and the long list of comments, at

http://blog.athico.com/2007/06/what-is-sequential-rule-engine.html. As you will see, the comment trail soon became a debate about the relative performance merits of different types of rule engine.The earlier part of the comment trail indicated to me that the various participants each had rather different ideas about what constitutes a ‘sequential’ rule engine. WF processes rules sequentially. It creates a collection of ‘Rule States’ internally, each one representing an individual rule in a rule set, and then works through them in a sequential order determined by the order in which the rules appear in the rule set, as modified by the use of priority settings. Each rule is evaluated, and its ‘then’ or optional ‘else’ actions are ‘fired’ as required. WF then continues at the next rule unless it detects that the current rule has changed values on which former rules depend. In this case, it re-evaluates the relevant previous rules as part of a forward chaining cycle. The forward chaining can be switched off for a given rule set if required.I’ve now undertaken some initial performance testing to compare WF and MS BRE, and decided to publish the results. As always, there is danger associated with publishing performance comparisons, and I need to make a few points clear. First, I do not claim that my tests constitute a comprehensive, sophisticated or real-world investigation of different performance characteristics. Far from it. As you will see, I have adopted very simplistic approaches. My tests do not reflect real-world scenarios with any exactitude. Despite the obvious drawbacks, I think that the approach still has merit. These types of contrived tests can usefully illuminate the basic performance issues and behaviour of the engines. However, they tend to exaggerate optimisations and problems alike, and don’t necessarily take all factors that affect performance into account. As long as the reader remembers that the real world is always murkier and more complex than the simplistic picture I paint here, they should be able to get some useful pointers in regard to the likely performance of the two engines in different scenarios.Another point I want to make very clear is that my figures compare two specific rules engines, both of which happen to come from the same software company. For my sins, I live and breathe the Microsoft world, so these two engines are currently highest on my personal radar. I mainly write blog articles for the BizTalk community, and again, these two engines are widely known in that community. I strongly caution anyone against trying to extrapolate more general lessons from my results. As I noted above, it is clear that there are many different models of ‘sequential’ engine. WF Rules cannot be taken as some sort of ‘reference’ implementation that represents a whole class of ‘sequential’ engines. I would fully expect other sequential engines to exhibit quite different performance characteristics. Again, there are many different implementations of set-based Rete engines, each with their own subtle differences. I do not consider it safe or reasonable to make wide-ranging deductions about entire classes of rule engine from the results I publish here, although I will make some general comments about Rete engines as I go along.

RationaleBased on anecdotal observation over the last few years, I suggest that, in the specific context of automated business process and workflow, most rule processing involves the evaluation of simple rules in order to make straightforward decisions. These decisions are typically about data validity, data derivation and the governance of process flow. I would further suggest that, in this same context, the majority of rule sets are relatively small. I can offer no formal evidence to back these claims – only the observation that this accords with the experience we have at the company I work for, having undertaken very many process- and workflow-orientated projects on behalf of a wide range of both public and private sector clients with greatly varying requirements.Taking this as a starting point, I am chiefly interested in exploring the performance characteristics of the two engines when applied to small rule sets containing simple decisioning rules, and, importantly, where the majority of rules tend to follow the same general pattern. This is the most common scenario I have come across in the BizTalk world, and my main concern is to find out which is the better engine to use.I chiefly design and implement BizTalk Server solutions. For BizTalk Server developers, it is natural to use the MS BRE which ships with BizTalk Server as a companion product. There is some lightweight integration between MS BRE and BizTalk Server orchestrations, although I generally favour invoking the MS BRE through my own custom code. WF Rules can be invoked in a similar fashion without requiring any WF workflow context. Hence, WF Rules can be used as a ‘stand-alone’ sequential engine supplied as part of the .NET 3.0 platform and invoked from within BizTalk Server. Which engine is the better choice in terms of performance?I should state here that any definition of ‘small’ in terms of rule set size is really quite arbitrary. Different people have a different sense of scale when describing a rule set as ‘small’ or ‘large’, depending on their experience of creating rule sets in real-world scenarios. For some, a rule set with 50 rules may appear fairly large. For others, a rule set of 10,000 rules may appear tame in comparison with the requirements they have encountered.This article is only concerned with assessing the two engines in terms of performance when executing rules. In the real world, there are many other important considerations to bear in mind when selecting an appropriate rule processing strategy or technology. These include available skill sets, tooling, expressivity, semantic mapping, support, etc. I will, however, cheerfully ignore all these other concerns within this article.

ApproachTests were carried out using WF Rules in .NET 3.0 (WF Rules is implemented as part of the WF libraries which form part of the .NET framework) and MS BRE 3.0.1.0. This version of MS BRE ships with BizTalk Server 2006, and was written to target .NET 2.0. Please note that, like everything else, blog articles age with time, and performance characteristics may change in later releases!All tests have been conducted outside of the context of any orchestration or workflow. In doing this, I am making an assumption that for both engines, it is entirely possible to invoke multiple instances of executable rule sets on different threads within the same process without incurring unexpected overhead through blocking, etc.. This is certainly the case with the MS BRE which works fine in the context of multiple simultaneous orchestration instances. I am not aware of any problems with WF Rules in this respect. Both engines have built-in support for caching rule engine instances, and MS BRE makes extensive use of object pooling. The built-in WF caching facilities are only invoked within the context of a workflow, and not designed to be used when invoking WF Rules directly, although it is fairly simple to create your own rule engine cache.For WF Rules, I used the current beta (2.2) of the Microsoft External RuleSet Toolkit. This toolkit is a first attempt by Microsoft to plug one of the most noticeable gaps in WF Rules – the lack of any rule repository, deployment framework or tooling for WF Rules. The toolkit provides facilities for storing rule sets as XML documents within a SQL Server database, and contains a WF activity (PolicyFromService) that can be used in place of the Policy activity to invoke specific versions, or the latest version, of a rule set. The current beta of the toolkit does not yet support policy caching, but this may be included in a future release.For MS BRE, I used the PolicyTester class to invoke rules. This is as close as I can get to the same functionality offered by the External RuleSet Toolkit. The PolicyTester class (which in my opinion is poorly named, as it has many uses beyond testing) is similar to the more familiar Policy class, but extracts rule sets directly from a store using an appropriate rule set provider. By default, it uses Microsoft’s SQL Server repository provider. Like the External RuleSet Toolkit, this repository stores versioned rule sets as XML documents.I have purposely limited the scope of what I am testing in order to investigate only the more fundamental aspects of performance. Where appropriate, I have used only simple equality tests that compare integers. I have limited rule actions to the simple task of setting a Boolean value. I have also used the greater-than operator in order to fire multiple rules for a single value. I made the point earlier that the tests are not designed to emulate real-world scenarios. In the real world, the performance of a given engine, executing a given rule set, is influenced by a combination of many factors. Indeed, there are so many factors that true comparative benchmarking can perhaps only reasonably be undertaken by implementing a wide range of different rule-sets based on real-world examples. I have not attempted to do this. Instead, I have tried to reduce the ‘noise’ level to a bare minimum in order to draw a few broad conclusions that I hope will prove to be useful indicators to general performance characteristics.There are certain assumptions implicit in the approach I have taken. First, I am assuming that, when using the built-in support for common predicates (e.g., equality tests, relational operators, etc), there is not a great deal of difference in terms of the performance of the built-in code. I have not undertaken detailed testing for a wide range of built-in predicates. This is reasonable, given that an inspection of the code shows a very marked degree of similarity in terms of the implementation of predicates. Indeed, I suspect the WF Rules developers used MS BRE code as a starting point. As just one trivial example, compare the following two code extracts (via Reflector), both taken from classes named ‘StringLiteral’ derived from base classes named ‘Literal’:

WF Rules:

internal override bool GreaterThanOrEqual(string rhs)
{
    return (0 <= string.Compare(this.m_value, rhs, false, CultureInfo.CurrentCulture));
}

MS BRE:

internal override bool GreaterThanOrEqual(string rhs)
{
    return (0 <= string.Compare(this.m_value, rhs, false, CultureInfo.CurrentCulture));
}

I am also assuming that in the real world, most rule sets created for WF would generally be defined in a near identical fashion in MS BRE. MS BRE offers a superset of expressivity in comparison to WF. Anything expressed in WF Rules can be expressed in MS BRE, often in much the same way. It also reflects that fact that both engines are designed to apply rule processing directly on .NET objects (POCOs – ‘Plain Old CLR Objects’). MS BRE has additional support, via .NET typed fact wrapper classes, for XML and ADO.NET.

‘First Hit’ and Rule Engine CachingFor each test, I have measured performance in two ways. The first investigates ‘first hit’ performance. This is the performance of an engine when it loads and executes a rule set for the first time within a process. Understanding ‘first hit’ performance is important for scenarios where a rule set is executed just once during a session. It is less important in scenarios where the same rule set is executed continuously within a single process over the life time of a session.For ‘first hit’ performance, I measure three distinct activities. The first is ‘load’ time. This is the elapsed time taken to retrieve a rule set from a repository and de-serialise it in memory. The second activity is ‘initialisation’. This is the elapsed time taken to initialise a rule engine instance with an executable representation of the rule set. For MS BRE, it is chiefly the time taken to translate a rule set into a node network (Rete). For WF Rules, initialisation is mainly about analysing rule sets in order to support chaining behaviour. The third activity is ‘execution’. This is the time taken to execute the rule set for the first time.In most BizTalk Server scenarios, and many WF Rule scenarios, the same rule set will be executed many times within the same process. In this case, the cost of the first hit is not generally a significant concern. The performance of second and subsequent rule set executions is a far more important consideration. For both engines, ongoing performance can be maximised by adopting good caching strategies. WF Rules and MS BRE both provide RuleEngine classes. Rule engine instances are initialised with specific rule sets and execute rules using executors. If a RuleEngine instance is cached, it can be re-used repeatedly to execute a given rule set, avoiding the cost of re-intialisation.MS BRE provides built-in rule engine caching within its Policy class. This is not supported within PolicyTester, but by caching PolicyTester instances, you can effectively cache RuleEngine instances. Similarly, WF provides RuleEngine caching via its implementation of the Policy activity. The closest equivalent to PolicyTester is the PolicyFromService activity which is part of the Externalised RuleSet Toolkit. This does not support caching in the current beta version, and I did not use it because I wasn’t executing rules in the context of a WF workflow. Hence, for WF, I wrote my own cache code.A common mistake when using WF Rules outside the context of WF workflows is to cache RuleSets, rather than RuleEngine instances. The RuleSet class does not contain any functionality for caching RuleEngine instances, although it does have an overloaded, but internal, version of the Execute() method designed to work in conjunction with built-in caching facilities in the WF Policy activity. Because it is internal, this method cannot be invoked directly in custom code, except via reflection.Because RuleSet instances do not cache RuleEngine instances, caching RuleSets provides only small performance improvements. I did test this (though I haven’t recorded by results here), and found that there was some benefit. However, each time a cached RuleSet is executed, it creates a new RuleEngine instance which then performs pre-processing. If you are using WF Rules in custom code, do not rely on RuleSet caching to optimise performance. Instead, you need to create an instance of RuleEngine, and then assign your RuleSet to it. Cache your RuleEngine and then call Execute() on the RuleEngine rather than the RuleSet. Here is a very basic cache class I wrote for testing purposes. It uses the Externalised RuleSet Toolkit to obtain and initialise a RuleSet from an external repository. Note that it is missing functionality such as exception handling, automatic selection of the latest version a rule set, or clearing the cache:

internal class RuleEngineCache {
  // Cache of rule engine instances
  private static Dictionary<string, RuleEngine> m_ruleEngineCache =
      new Dictionary<string, RuleEngine>();

  // Return an instance of a rule engine for a given rule set instance and
  // object type
  public RuleEngine Allocate(string ruleSetName, int ruleSetMajorVersion,
                             int ruleSetMinorVersion, Type thisObjType) {
    // Create a unique identifier for the version of the rule set
    string ruleSetVerId = ruleSetName + "_" + ruleSetMajorVersion.ToString() +
                          "." + ruleSetMinorVersion.ToString() + "_[" +
                          thisObjType.FullName + "]";

    RuleEngine ruleEngine;

    if (!m_ruleEngineCache.TryGetValue(ruleSetVerId, out ruleEngine)) {
      // Get RuleSet from repository using the Externalised RuleSet Toolkit
      // (ERST)
      RuleSet ruleSet = new RuleSetService().GetRuleSet(new RuleSetInfo(
          ruleSetName, ruleSetMajorVersion, ruleSetMinorVersion));
      if (null == ruleSet) {
        return null;
      }

      // Create and cache a new RuleEngine over the rule set
      ruleEngine =
          new RuleEngine(ruleSet, new RuleValidation(thisObjType, null));
      m_ruleEngineCache.Add(ruleSetVerId, ruleEngine);
    }

    return ruleEngine;
  }
}

Here is code that executes the rule set using the cache:

// Get the cache
RuleEngineCache rEngCache = new RuleEngineCache();
 
// Execute the rule set over an object
RuleEngine rEng = rEngCache.Allocate("MyRuleSet", 1, 0, typeof(MyTestClass));
rEng.Execute(myTestObject);

As I stated previously, I used the PolicyTester class for MS BRE, and cached a PolicyTester instance in order to investigate the performance of the engine when using RuleEngine caching. PolicyTester is instantiated over a RuleSet, and uses this to initialise a RuleEngine instance which it retains across multiple executions. When you execute the PolicyTester, it calls Execute() on the RuleEngine object, asserting a list of ‘fact’ objects, executing the Rete network and then retracting the facts, leaving the initialised RuleEngine ready for the next execution.When conducting tests with rule engine caching, I was careful to execute the rule sets a few times before starting my measurements. This is important in order to eliminate any ‘first hit’ overhead. Tests with caching are designed to indicate the performance characteristics of fully ‘warmed-up’ rule engine instances.

Load and Initialisation times for ‘First Hit’ testsThe ‘first hit’ tests tell us something useful about rule engine performance when used in certain scenarios. For example, you might want to exploit WF Rules in a desktop application at start-up. The ‘first-hit’ performance becomes a component of application start-up performance, and could potentially have an impact on user experience.It makes little sense to measure ‘first hit’ performance only in terms of execution time. We also need to consider load and initialisation times. In most scenarios, all three activities will be done together, and the user will not be aware of the distinction. Load time, however, is problematic. First, we must decide where to load rule sets from. Both engines represent rule sets using XML, and provide mechanisms for de-serialising XML into runtime RuleSet object graphs. In a WF context, rule sets are normally compiled into a workflow assembly as managed resources and loaded via reflection. MS BRE does not provide equivalent functionality out of the box, but it is a trivial matter to create a RuleStoreProvider class to retrieve rule sets from managed resources. The MS BRE Policy class works with the local Rule Engine Update Service to retrieve a rule set from an external repository. The Rule Engine Update Service is a Windows service that supports a pub/sub mechanism. Repositories can publish notifications each time a new version of rule set is deployed, and local instance of the service can subscribe to these. The Rule Engine Update Service is generally used with the out-of-the-box SQL Server repository, and the subscription component supplied by Microsoft actually polls the database. When a Policy needs a rule set, it makes a request to the Rule Engine Update Service which, if necessary, uses cached notification information to download the required rule set version from a given repository. The rule set is cached by the service, so future requests from Policy instances do not result in a round trip to SQL Server. Furthermore, the Policy uses an internal cache class to cache initialised RuleEngine instances. This is done at process level using a static collection.Instead of Policy, I used the PolicyTester class which directly downloads rule sets from a repository without making requests to the Rule Engine Update Service. You wouldn’t usually use this class in a BizTalk Server production environment. In using PolicyTester, I am making the assumption (not validated) that load times are fairy similar to the load times incurred when using the Rule Engine Update Service to load a rule set from the repository for the first time. This is not unreasonable assumption since in both cases the loading is done using the same provider functionality.For WF, I have used beta 2.2 of the Externalised RuleSet Toolkit. This seems reasonable, given that I am interesting in exploring the relative performance merits of each engine from the perspective of a BizTalk developer. Using ERST would be the natural choice when exploiting WF Rules in conjunction with BizTalk Server.Here is the code used to load, initialise and execute the WF Rules rule sets using the ERST. Note that the variable ‘hpt’ contains an instance of a helper class called HiPerfTimer (see

http://www.codeproject.com/csharp/highperformancetimercshar.asp ) which I used to measure elapsed time.

// Load
hpt.Start();
RuleSetServicersService = new RuleSetService();
RuleSet ruleSet = rsService.GetRuleSet(new RuleSetInfo("RuleSetName ", ver, 0));
hpt.Stop();

sw.Write("Load time, " + hpt.Duration.ToString() + ", s");

// Initialise
hpt.Start();
RuleEngine ruleEngine =
    new RuleEngine(ruleSet, new RuleValidation(typeof(TestValue), null));
hpt.Stop();

sw.Write("Init time, " + hpt.Duration.ToString() + ", s");

// Execute
hpt.Start();
ruleEngine.Execute(testObj);
hpt.Stop();

sw.WriteLine("Run time, " + hpt.Duration.ToString() + ", s");

Here is the equivalent code used for MS BRE.

// Load
hpt.Start();
SqlRuleStore sqlRuleStore = new SqlRuleStore(connectionString);
RuleSet ruleSet = sqlRuleStore.GetRuleSet(new RuleSetInfo("RuleSetName", ver, 0));
hpt.Stop();
 
sw.Write("Load time, " + hpt.Duration.ToString() + ", s");
 
// Initialise
hpt.Start();
PolicyTester pt = new PolicyTester(ruleSet);
hpt.Stop();
 
sw.Write("Init time, " + hpt.Duration.ToString() + ", s");
 
// Execute
hpt.Start();
pt.Execute(facts);
hpt.Stop();

sw.Write("Run time, " + hpt.Duration.ToString() + ", s");

For WF Rules, you execute rule sets over a single object. For MS BRE, you assert ‘fact’ objects to the engine before execution. You can assert many facts to the engine in order to represent sets of data. Most of the tests compare near identical rule sets on both engines, and hence I generally asserted a single fact to MS BRE. As mentioned before, MS BRE performs rule processing directly over POCOs and also over ‘typed’ wrapper classes representing XML, ADO.NET tables and rows or live database connections. WF Rules does not provide support for these additional data representation types, and hence I have only used POCOs in MS BRE. It is important to note that MS BRE performance is likely to be different when using ‘typed’ wrappers.I performed all tests on an HP nw9440 2.33 GHz dual core notebook computer with 4GB RAM. Load time is largely I/O-bound, rather than CPU bound, and the use of a notebook computer is a poor choice for measuring I/O-bound performance. My results for load time should therefore be treated with considerable care when considering likely performance in real-world scenarios.

WF Rules Chaining Behaviour

WF Rules supports forward chaining. When a rule is evaluated, and its actions are executed, those actions may change data used during the evaluation of previous rules. In this case, the engine can detect which previously evaluated rules are affected and loop back to re-evaluate those rules, firing their actions as required a second time. This mechanism allows the design of more sophisticated rule patters involving dependencies between different rules.

When a WF RuleEngine object is instantiated, it performs pre-processing of the rule set in its constructor. The first part of pre-processing involves creation of a sorted list of RuleState objects. Each RuleState is created over an individual rule. The pre-processor then enters an analysis phase. However, analysis is only undertaken if chaining behaviour is set to ‘forward chaining’ or ‘update only’. ‘Update only’ is a special version of forward chaining is which chaining will only occur if the Update action is used explicitly. The engine generally performs more analysis work for ‘forward chaining’ mode than for ‘update only’ mode. This analysis is important in order to initialise the engine so that at run-time it can detect which previously evaluated rules are affected by later changes to the data. WF Rules refers to this as ‘side effects’, but uses this term quite differently to MS BRE. In MS BRE, side effects are controlled at the level of data facts (via caching of data values), rather than rules, and do not automatically lead to chaining.

The additional pre-processing work associated with forward chaining behaviour is an important consideration when measuring ‘first hit’ performance. Hence, tests were conducted using different chaining behaviours. I also tested performance of cached rule engines using different chaining settings to find out what effect, if any, this had on repeated execution.

MS BRE Side Effects

MS BRE regards the development of rule sets as a form of functional (and therefore declarative) programming. Theoretically, a ‘pure’ function should execute without causing ‘side effects’ within other functions. However, there is an impedance mismatch between this ideal and the imperative nature of object-orientated .NET code. This is a problem when executing rules directly over POCOs. MS BRE supports a ‘side effects’ flag that indicates if side effects should be allowed when using .NET methods or properties as predicates or arguments in rule conditions.

The ‘side effects’ flag is used to control internal value caching within the engine. It is set on individual predicates and arguments within a rule, and therefore offers fine-grained control over caching. When set to true, the engine switches off the cache, and may repeatedly invoke the .NET member. When ‘false’, the .NET member will be invoked once per rule set invocation, and its value will be cached. By default, MS BRE switches caching off when using POCOs and on when using typed wrappers.

The use of the side effects flag can have a significant effect on performance. The engine will generally perform better when caching is used. Unfortunately, although the side effects flag is documented, it is not supported within the Microsoft Rule Composer UI, and can only be set by editing the rule set XML (Business Rule Language) directly. When using the SQL repository, this involves exporting the rule set to file, editing the XML and then importing the amended version.

I tested MS BRE for both settings of the side effects flag in order to compare the performance differences.

Test 1Test 1 is as simplistic as you could imagine in terms of its design. Each rule in the rule set has the following form:

IF
    TestObject.Value == <literal integer value>
THEN
    TestObject.Valid = true

All rules have equal priority. I wrote code to generate rule sets, and incremented the value of the literal integer by one for each rule, starting at 1. I generated rule sets of varying sizes in the range of 1 – 10,000 rules. Even though I am chiefly interested in investigating the performance of the two engines when applied to small rule sets, it was useful to test larger rule sets in order to get a better picture of scalability characteristics.

As I mentioned above, WF Rules has no concept of individual ‘facts’. You execute a rule set over a single object, referred to as the ‘this’ object. In the context of a WF workflow, this is generally the WF workflow itself, but in Test 1, I created a single instance of a custom class. I took a similar approach in MS BRE, asserting a single fact to the engine. The Value property was set to 1. By not using Halt in WF, I ensured that WF evaluated the object against each rule in turn. I will discuss MS BRE’s behaviour in this respect after we have looked at the results. The point here is that the test object matches just one single rule. The resulting action is kept very simple. The test is designed to investigate scaling characteristics of the two engines in relation only to the size of the rule set.

Test 1: ‘First Hit’ resultsThe numeric results are published in the appendix at the end of this document. Here are the results in graphic form. The first graph shows the overall results. The second graph shows the results for smaller rule sets only. Each graph shows the combined time for execution, initialisation and load.

We can see from these results that WF Rules scales quite differently to MS BRE, and that the use of forward chaining imposes a considerable additional cost for large rule sets, although it is of little consideration for smaller rule sets. In MS BRE, the side effects flag is not a significant consideration for ‘first hit’ performance.The second graph shows that WF Rules out-performed MS BRE on the first hit for small rule sets. This is a significant finding, given that I am most interested in exploring the behaviour of these two engines when applied to small rule sets. A more detailed investigation of the results shows that WF Rules outperforms MS BRE for all three measures (load, initialise and execute). The most critical factor here (not surprisingly) is load time. However, as the rule set size increases, the situation is reversed on all three measures.

Test 1: Cached Rule Engine resultsHere are the results obtained when using rule engine caching. Again, the first graph shows overall results and the second graph focuses on smaller rule sets. In this case, we measure execution time only, and only begin measurement after ‘warming’ the engine up by executing the rule sets a few times.

Not surprisingly, the results show a huge performance increase across the board. For WF Rules, we can see that the chaining behaviour makes no significant difference, even for large rule sets. The additional cost of using forward chaining appears to be confined to ‘first hit’ scenarios.When using rule engine caching, MS BRE outperformed WF Rules for small, as well as large rule sets. It also displayed significantly better scaling characteristics. Both engines display near-linear scaling characteristics across the rule set size range. For MS BRE, there was a very significant improvement in performance when side effects were avoided (i.e., caching was switched on).

MS BRE Optimisation Issues

The higher performance of MS BRE when using rule engine caching is a natural consequence of the use of the Rete algorithm. In this test, the single asserted ‘fact’ passes through a discrimination network of in-memory nodes. The object passes from a root node to a ‘class node’ which tests its type (rather redundantly in this case). The class node passes the object on to a set of descendant ‘select’ nodes. In this test, the number of ‘select’ nodes is equal to the number of rules. However, this is co-incidental due to each rule having a single unique conditional test. Each select node does not represent a rule, per se.

The important point here is that each select node has only the single concern. It evaluates the fact according to a given condition. If the fact matches the condition, the node passes the fact on to a subsequent node. In this case, the subsequent node is a ‘terminal’ node. Terminal nodes represent individual rules. Our fact will only match a single condition. Hence, it is passed to one terminal node only. The engine does not activate any of the remaining terminal nodes.

The Rete algorithm helps to reduce the work undertaken by the engine during rule matching to a reasonable minimum. Even so, there is some overhead. In Test 1, for 10,000 rules, there are 10,000 evaluations and one rule ‘firing’. On my machine, I can execute simple custom code that will evaluate an integer 10,000 times on a single thread in about 0.01 ms. It takes about 10 ms to evaluate a public integer property 10,000 times using reflection. MS BRE introduces additional overhead as it iterates over node sets, but this is fairly minimal. The overhead for WF Rules is also small, but significantly higher than for MS BRE.

Any readers who are Rete experts may have realised that the current version of MS BRE does not support an optimisation technique used in some modern engines. It is possible to increase the performance of this part of the node network further by implementing hash-based propagation from ‘class’ nodes to ‘select’ nodes that evaluate literal values. For any given fact type, select nodes can register their associated attribute/literal value pairs with the relevant class node. This allows the class node to inspect the relevant attributes and propagate facts only to those nodes designed to match a given literal value. This would have reduced the match effort, regardless of the rule set size, to a single evaluation in the class node together with a hash table look-up which would return a single select node. The resulting execution graph line would exhibit O(1) time complexity . In English, this means that engines with this optimisation will always take a constant amount of time to complete Test 1, regardless of the size of the rule set. Of course, for really large rule sets, other factors would eventually set in, such as high memory usage. One way to think about this optimisation is that it is rather like performing an indexed lookup in a relational database table, although it is rule conditions that are ‘indexed’, rather than the data asserted to the engine.

To demonstrate this optimisation, here are the results for Test1 with the equivalent of RuleEngine caching for a well-know Java open source Rete rules engine (JBoss Rules 4.0.0) which implements this technique.

Please note that this does not constitute a comparative test between MS BRE and JBoss Rules. My purpose here is simply to demonstrate the optimisation I have described.In many real-world scenarios, I would expect the decay in performance in MS BRE for larger rule sets to be more pronounced due to the overhead of different combinations of data types and predicates. Test 1 shows the basic performance characteristics of MS BRE in the best possible light.

Test 2

Having explored the performance of the two engines when applied to the simplest of rule sets, we will now look at how this performance is affected by various changes. In Test 2, the original test is modified to add additional conditions to each rule. Here is the rule pattern:

IF
    AND
        TestObject.Value1 == <literal integer value>
        TestObject.Value2 == <literal integer value>
        TestObject.Value3 == <literal integer value>
THEN
    TestObject.Valid = true

As for test 1, a generator was used to generate rule sets between 1 and 10,000 rules in size. The same literal value is used in each of the three conditions in any one rule, and this value is set to 1 for the first rule and then incremented by one for each subsequent rule. The properties of the test object were each initialised to 1 so that the test object matched just one rule. The test was designed to be identical in all ways to Test 1 except that each rule has three conditions, rather than 1.

Test 2: ‘First Hit’ results

Not surprisingly, the results display similar characteristics to test 1, but overall performance is slower. One reason is that because each rule now contains three conditions, the rule set is correspondingly larger, and takes longer to download and process. The effect is worse for WF Rules than for MS BRE, and consequently, MS BRE starts out-performing WF Rules at slightly smaller rule set sizes.

Test 2: Cached Rule Engine results

These results are virtually identical to those obtained for Test1. The reason is that each rule represents the conjunction (‘ANDing’) of three conditions. If the first condition evaluates to false (which it does for all but one rule), both engines effectively short-circuit the evaluation of the other two conditions. Hence, the amount of evaluation work done by each engine is virtually identical to that for Test 1.

For Rete engines, the short-circuiting of conjoined conditions is a natural consequence of the use of a node-based discrimination network. If a fact or token is evaluated by a node, and the result is ‘false’, the node does not pass the fact or token on to the next node for evaluation. In the case of Test 2, MS BRE further optimises this by assigning the multiple conditions in a rule to a single select node. Each select node ‘short-circuits’ internally.

This short-circuiting in MS BRE highlights one of the fundamental problems faced by developers when using set-based ‘production’ systems (the term ‘production’ refers here to the kind of rule processed by MS BRE). In a previous article (http://geekswithblogs.net/cyoung/articles/79500.aspx), I wrote about the lack of short-circuiting when using disjunctions (‘ORing’). Most developers are more used to mainstream imperative, procedural programming languages, rather than declarative, set-based pattern matching systems. They often expect short-circuiting to be used for both conjunctions and disjunctions. This is the case in WF Rules. However, MS BRE does not obey this familiar pattern. Sometimes it short-circuits. Sometimes it doesn’t. This behaviour may appear to be inconsistent. However, it is really a logical consequence of applying rule processing to data sets. The issue is obscure, and the behaviour of the engine is opaque, making it harder for rule developers to identify, analyse and fix problems.

Test 3Test 3 is similar to the previous two tests. It explores performance when all rules match. Like Test 2, each rule contains three conjoined conditions. Here is the rule pattern:

IF
    AND
        TestObject.Value1 > <literal integer value>
        TestObject.Value2 > <literal integer value>
        TestObject.Value3 > <literal integer value>
THEN
    TestObject.Valid = true

Rule sets were generated between 1 and 10,000 rules in size. The same literal value is used in each of the three conditions in any one rule, and this value is set to 1 for the first rule and then incremented by one for each subsequent rule. The properties of the test object were each initialised to 10001 so that the test object matched every rule.

Test 3: ‘First Hit’ results

These results are virtually identical to those obtained for Test 2. This is hardly surprising given that in each test, each rule has three conditions and a single action. The size and complexity of the rule sets are essentially the same.

Test 3: Cached Rule Engine results

The scaling characteristics remain much the same as for the previous two tests. However, for both engines, the decrease in performance is in line with the fact that a) all three conditions are now being evaluated, and b) each rule is firing once.

Test 4

Test 4 introduces additional overhead into the evaluation of each rule be means of a custom predicate (a method that returns a Boolean value). The method takes 10 ms to complete its work. This is implemented by calling Thread.Sleep(). The predicate is used in the first condition in every rule, and this condition is identical in each rule. A second condition tests an integer value, as for Test 1. The pattern for each rule is:

IF
    AND
        TestObject.BusyPredicate()
        TestObject.Value = <literal integer value>
THEN
    TestObject.Valid = true

As for Test 1, the literal integer value is incremented by one for each rule, and the value on the test object is set to 1 so that it matches just one rule.

This test was designed to demonstrate an expected optimisation in MS BRE. Because the first condition in each rule is identical, I expected MS BRE to create a single node to perform the call to BusyPredicate just once. BusyPredicate returns ‘true’ after 10 ms, and I expected to see the total cost to be little more than 10 ms, even for the largest rule set sizes. I did not expect to see this optimisation in WF Rules.

Because this test is all about execution time, I have shown the results for RuleSet caching only.

Test 4: Results

As you can see, the results for MS BRE with side effects were not as I expected. In this case, MS BRE’s performance was almost identical to WF Rules. When caching was switched on, the results were in line with expectations across much of the range, but began to increase at large rule sets sizes.

Using a custom debug tracker that reflects on the inner state of the MS BRE engine, I established that MS BRE does not produce a single node for the common condition in each rule. Instead, it creates a Rete similar to previous tests containing the same number of select nodes as there are rules, and assigning two conditional tests to each node. The BusyPredicate is invoked as the first condition in each node. Hence, when caching is not being used, it is invoked once for each rule, rather than just one time. When caching is used, it is invoked once by the first evaluated select node, and the results are cached. For each subsequent evaluation, the cached value is returned. Hence, the results exhibit additional overhead for retrieving the same value repeatedly from cache.

This was a surprising result. Optimisation via node sharing is one of the most basic and best understood optimisations within a Rete engine, and the use of an identical condition in every rule would generally cause this optimisation to be exploited. I did some further ad hoc testing using built-in predicates and literal values, and discovered that MS BRE seems only to use node sharing for its select nodes (i.e., its ‘alpha’ network) when all conditional tests for any one given object type are identical in two or more rules. In this test, the second condition in each rule tests the Value property against different integer values, and therefore the engine does not use node sharing for the first condition.

To demonstrate node sharing, I amended the test to make the second condition identical across all rules. In this highly contrived test, every rule is is identical in terms of its conditions (and actions). The resulting Rete contains just one Select node to perform the two conditional tests, regardless of the rule set size. Here are the results:

When using value caching in the original version of Test 4, the overall performance is close to what might be expected if node sharing was supported. The performance benefits associated with the side effects flag are clearly significant, which makes it all the more surprising that this flag is not exposed through the Rule Composer UI.

Test 5

One of the most fundamental differences between the two engines is that MS BRE is designed to process rules over multiple ‘facts’, whereas WF Rules operates on a single so-called ‘this’ object. It is often the case that we may want to evaluate rules against a set of data. For example, consider a rule that derives the discount that should be offered against purchase orders. At runtime, we may have a set of purchase orders, and may need to run the rules against each purchase order in turn.

This is supported directly in MS BRE. We simply assert each purchase order as a ‘fact’ to the engine. We can pass the purchase orders as an array of .NET objects, a single XML document containing, say, multiple <PurchaseOrder> elements or an ADO.NET table containing many purchase order rows. We can even execute our rule set against an external database table containing purchase order rows. MS BRE will evaluate the rule set against each ‘fact’ automatically.

In WF, our options are more constrained. We can broadly take one of two approaches. The first is to execute the rule set multiple times using each purchase order in turn as the ‘this’ object. In this case, we need a single rule as follows:

Rule1
IF
    this.Amount > 1000
THEN
    this.DiscountPerc = 10
ELSE
    this.DiscountPerc = 5

In this approach, we depend on code outside of the rule to iterate through the set of purchase orders and invoke the rule engine repeatedly.

The second approach is to exploit the forward chaining facilities of WF Rules to iterate over a collection of Purchase Order objects maintained by a single ‘this’ object. In this case, we need only execute the rule set a single time. We have effectively moved iteration code from external code into the rule set. We could introduce two rules as follows:

Rule1: priority 1
IF
    !this.PurchaseOrderEnum.MoveNext()
THEN
    this.ResetEnum()
    Halt
Rule2: priority 0
IF
    this.PurchaseOrderEnum.Current.Amount > 1000
THEN
    this.PurchaseOrderEnum.Current.DiscountPerc = 10
    Update(this/)
ELSE
    this.PurchaseOrderEnum.Current.DiscountPerc = 5
    Update(this/)

The PurchaseOrderEnum property returns an IEnumerator for the Purchase Order collection. The ‘this’ object caches the enumerator internally so that it can be used repeatedly. The ResetEnum method resets this enumerator so that we can repeat the test. The second rule sets a discount of 10% on PO amounts greater than 10,000, and 5% otherwise.

Test 5 tested both the bove rule sets for WF Rules. Tests were run over varying numbers of purchase order objects in the range of 1 to 10,000 purchase orders.

MS BRE rules are set-based but do not support ‘else’. In MS BRE, we can create the same logic with two rules:

Rule1
IF
    PurchaseOrder.Amount > 1000
THEN
    PurchaseOrder.DiscountPerc = 10
Rule2
IF
    PurchaseOrder.Amount <= 1000
THEN
    PurchaseOrder.DiscountPerc = 5

Because MS BRE is set-based, we can execute the engine just one time. There is no need to use forward chaining to iterate over a collection, no need to update or re-assert purchase orders and no need to specify rule priority to guarantee the order in which rules fire.

This rule set is more declarative and simpler than the forward-chaining WF rule set above, even though it does not use external iteration code. This is because there is no need to use forward chaining to iterate over a data set. Also, we do not need to implement any equivalent of the ‘this’ object. Instead, as for the first of the WF rule sets, the rules operate directly over PurchaseOrder instances.

The results below compare the performance of the two rule sets above. I have tested execution using RuleEngine caching.

Test 5: Results

The results show that MS BRE out-performed WF Rules across the range of fact set sizes. Both engines exhibit near-linear scaling as the number of facts is increased. Using value caching in MS BRE provided only a small performance increase. Value caching is of most benefit when re-evaluating the same members of the same objects many times during rule execution. In this case, the two rules both evaluate the Amount property of each purchase order instance. Value caching effectively cuts in half the number of ‘Amount’ evaluations done in total.

For WF Rules, the first rule set, executed in sequential mode over the single rule, provided higher performance than the second rule set executed with forward chaining over two rules. This is not surprising. For the second rule set we have effectively moved the iteration code into the rule set. The total functionality is much the same as for the first rule set, but with the additional overhead of executing rules to use the enumerator.

Test 6

Set-based processing is a pre-requisite for performing joins. This is a common requirement is rule processing. For example, consider an extended version of Test 5 where we assign discount rates based on purchase order amounts and also on some rating attribute of the customers who submit purchase orders. In English, we might express a business policy as:

“If the purchase order total amount is greater than 10000, and the customer is a ‘gold’ customer, offer a discount of 15%”

“If the purchase order total amount is greater than 10000, and the customer is a ‘silver’ customer, offer a discount of 10%”

“If the purchase order total amount is greater than 10000, and the customer is a ‘bronze’ customer, offer a discount of 5%”

“If the purchase order total amount is less than or equal to than 10000, and the customer is a ‘gold’ customer, offer a discount of 8%”

“If the purchase order total amount is less than or equal to than 10000, and the customer is a ‘silver’ customer, offer a discount of 5%”

“If the purchase order total amount is less than or equal to than 10000, and the customer is a ‘bronze’ customer, offer a discount of 0%”

In a batch-processing scenario, we might have batches containing purchase orders for many different customers. Each customer might have many purchase orders. We would need to perform a join on customer and purchase order entities as part of processing the business policy. There are many ways we might do this. One way is to use the built-in set processing capabilities of MS BRE.

In all the previous tests, we have exercised only the so-called ‘alpha’ network within MS BRE Retia. The alpha network is the left-most part of a Rete, and is concerned with evaluating specific facts. If joins are required, it passes data tokens on to ‘beta’ network of nodes. In the beta network, each node takes two separate inputs. Each input comes from a ‘memory’ which holds a collection of data tokens referencing one or more facts. The most common activity within a beta node is to perform a join over the two inputs. MS BRE only supports join nodes in its beta network. It does not support node types for negation, etc.

WF Rules offers no built-in support for processing sets, and therefore does not provide any equivalent functionality for performing joins. As for test 5, we are left with two broad choices. One is to perform joins outside of the engine, and execute the engine over each resulting data tuple. There are many ways we might achieve this. If the data is stored in an external database, we could perform the join through SQL and then invoke our rule set over each data row in the results. Alternatively, we could write custom code to perform joins on multiple collections of objects. We could use technologies such as ADO.NET, or the LINQ features built into the next generation of Microsoft’s .NET compilers. The second approach would be to push the ‘join’ functionality into the WF rule set, using forward chaining to loop repeatedly over multiple collections.

We saw in Test 5 that for WF Rules, it is generally more optimal to iterate over collections outside of the rule engine. This also has the benefit of allowing our run-time rules to remain more strongly and cleanly aligned to our business rule statements. For this test, I have only implemented external approaches to performing joins in WF Rules, and have not used forward chaining techniques.

I have created two versions of the WF rule set. Here is the first version:

Rule 1:
IF
    this.Customer.Id == this.PurchaseOrder.CustomerId &&
        this.Customer.Rating == “Gold” && this.PurchaseOrder.Total > 1000
THEN
   this.PurchaseOrder.DiscountPerc = 15
   Halt
ELSE
   this.PurchaseOrder.DiscountPerc = 8
   Halt
Rule 2:
IF
    this.Customer.Id == this.PurchaseOrder.CustomerId &&
        this.Customer.Rating == “Silver” && this.PurchaseOrder.Total > 1000
THEN
   this.PurchaseOrder.DiscountPerc = 10
   Halt
ELSE
   this.PurchaseOrder.DiscountPerc = 5
   Halt
Rule 3:
IF
    this.Customer.Id == this.PurchaseOrder.CustomerId &&
        this.Customer.Rating == “Bronze” && this.PurchaseOrder.Total > 1000
THEN
   this.PurchaseOrder.DiscountPerc = 5
   Halt
ELSE
   this.PurchaseOrder.DiscountPerc = 0
   Halt

In order to execute this rule set, I enclose the execution code in two nested foreach loops:

foreach (Customer customer in cuFacts)
{
    foreach (PurchaseOrder purchaseOrder in poFacts)
    {
        custPoThis.Customer = customer;
        custPoThis.PurchaseOrder = purchaseOrder;
 
        rEng.Execute(custPoThis);
    }
}

This is deliberately naïve code. It creates a Cartesian product of the two sets of facts, and invokes the engine for each resulting pair.

The second version of the rule set exploits an obvious optimisation. We can move the join condition (this.Customer.Id == this.PurchaseOrder.CustomerId) out of each rule, and perform it as part of the external code.

foreach (Customer customer in cuFacts)
{
    foreach (PurchaseOrder purchaseOrder in poFacts)
    {
        if (customer.Id == purchaseOrder.CustomerId)
        {
            custPoThis.Customer = customer;
            custPoThis.PurchaseOrder = purchaseOrder;
 
            rEng.Execute(custPoThis);
        }
    }
}

This is likely to significantly reduce the number of times we invoke the engine, and performing the evaluation in ‘raw’ code rather than using reflective techniques within the engine is bound to improve performance. In real life, we could probably afford to adopt this approach because the ‘Id’ value is a key attribute of customer and a foreign key of PurchaseOrder. It represents a fundamental relationship between the two entities that is highly unlikely to change over time, and therefore we can afford to hard-wire the join condition into custom code.

Note that I have optimised the rule set further by adding ‘Halt’ to each action list. This immediately halts the engine, and prevents it from evaluating any further rules.

For MS BRE, we can assert entire sets of customers and purchase orders to the engine and let it perform the join. Here is the equivalent rule set for the first WF rule set version above:

Rule 1:
IF
    AND
        Customer.Id == PurchaseOrder.CustomerId
        Customer.Rating == “Gold”
        PurchaseOrder.Total > 1000
THEN
   PurchaseOrder.DiscountPerc = 15
Rule 2:
IF
    AND
        Customer.Id == PurchaseOrder.CustomerId
        Customer.Rating == “Gold”
        PurchaseOrder.Total <= 1000
THEN
   PurchaseOrder.DiscountPerc = 8
Rule 3:
IF
    AND
        Customer.Id == PurchaseOrder.CustomerId
        Customer.Rating == “Silver”
        PurchaseOrder.Total > 1000
THEN
   PurchaseOrder.DiscountPerc = 10
Rule 4:
IF
    AND
        Customer.Id == PurchaseOrder.CustomerId
        Customer.Rating == “Silver”
        PurchaseOrder.Total <= 1000
THEN
   PurchaseOrder.DiscountPerc = 5
Rule 5:
IF
    AND
        Customer.Id == PurchaseOrder.CustomerId
        Customer.Rating == “Bronze”
        PurchaseOrder.Total > 1000
THEN
   PurchaseOrder.DiscountPerc = 5
Rule 6:
IF
    AND
        Customer.Id == PurchaseOrder.CustomerId
        Customer.Rating == “Bronze”
        PurchaseOrder.Total <= 1000
THEN
   PurchaseOrder.DiscountPerc = 0

Each ‘Customer.Id == PurchaseOrder.CustomerId’ condition in each rule defines the join. These conditions will be evaluated in join nodes in the beta network. No node sharing is possible (or beneficial) here because each join node must feed tokens to a different terminal node. The other conditions will be evaluated in the alpha network. The engine will filter each customer and each purchase order through the alpha network, and then only perform joins appropriately on those facts which make it to the beta network. For example, if a customer does not have a rating, the Customer fact will not be passed to the beta network, and will therefore not be joined on PurchaseOrder facts.

For MS BRE, we don’t use ‘Halt’. Halt has different effects in the two engines. In MS BRE, it does not halt condition evaluation. This is because the engine performs all evaluations against all facts in the Rete and then creates a prioritised queue of matching rule instances (called ‘activations’) on its ‘agenda’. It then ‘fires’ each activation in turn. If Halt is called, this prevents any further activations from being fired. However, it is too late to prevent condition evaluation.

It is important to remember that we are not comparing like-for-like in this test. To the extent that we test the performance of the MS BRE engine when performing joins, we test the performance of custom code when executing WF rules sets. There are ways in which the performance of this code could be improved. For example, we could build some kind of indexing, possibly based on dictionaries or hash tables. Also, as I suggested above, we could exploit the query processor in a relational database in order to perform the join, in which case we would get indexing for ‘free’.

The test was conducted by generating sets of customer and purchase order objects. For each test run, I created exactly eight customer objects, each with a unique ‘Id’. Two customers were created for each ‘Rating’, and two more had no rating. For each customer I created a given number of matching purchase orders. I tested for the range of 1 to 10,000 purchase orders per customer. Half the purchase order had a ‘Total’ value over 1000.

Test 6: Results

The performance of the first WF rule set (‘unfiltered’) is very poor. Creating a Cartesian product and invoking the engine over each resulting pair is an inefficient strategy. Filtering the Cartesian product externally in order to reduce the number of executions had a huge effect on performance, and gave the best results.

The results for MS BRE were not as good as for the ‘filtered’ version of the WF rule set, but provided reasonable performance, especially when value caching was used.

Looking at the resulting graph, the impression is that WF Rules is capable of out-performing MS BRE when performing joins. However, this is a false conclusion. WF Rules does not have any built-in functionality to perform joins, and the results primarily reflect the performance of external code. There is nothing stopping us employing the same approach with MS BRE. Just because MS BRE can perform joins internally does not mean that we have to use this feature. I amended the MS BRE test to make it equivalent to the ‘filtered’ WF Rules test, and re-ran it to see what happened. Here are the results:

The ‘filtered’ version of the MS BRE test significantly outperformed the ‘filtered’ WF Rules test across the full range of purchase orders per customer. I did, however, have to optimise my custom code carefully. Specifically, instead of asserting separate customer and purchase order pairs to the engine, I wrote the code to assert a single ‘tuple’ object which referenced a customer/purchase order pair and exposed the required attributes. This is similar to the WF Rules approach in which the rules are executed over a single ‘this’ object. Using this approach, I was able to eliminate beta nodes from the Rete, and optimise the rule set to the greatest degree. The results were obtained using value caching, but in this case, caching made very little overall difference to the results.

The test does not clearly illustrate an important optimisation. MS BRE, in common with most modern Rete production systems, optimises joins by using indexing to retrieve data tokens from memories. MS BRE specifically indexes alpha memories. Because the engine allows indexed alpha memories to directly perform left activations on join nodes at the top of the beta network (rather than use special ‘adapter’ nodes), it effectively supports the equivalent of beta memory indexing as well, but only where right activations are performed at the interface between alpha and beta networks. Lower down the network, right activations perform sequential scans on beta memories, although left activations still get the benefit of indexing. Indexes are composed of sorted sets of hash tables which store equivalence classes. An equivalence class is a subset of data tokens where the fact referenced by each token has a given attribute value. When a join node joins a token over an indexed memory, the enumerator is ‘seeded’ with the appropriate attribute value, and uses this to enumerate only those tokens which match that value according to the evaluation test. This approach is only used for the basic relational predicates (==, !=, <, >, <= and >=).

Memory indexing generally makes a significant difference when performing joins in Rete systems. In the case of our very simple test, it has a somewhat similar effect to performing pre-filtering outside the engine, so it is not, perhaps, surprising that the results for MS BRE are fairly close to the ‘filtered’ WF test. MS BRE does a reasonable job of indexing memories, although there would be merit in extending indexing to beta memories in a future version. Another consideration is that MS BRE uses non-generic .NET collection classes. In some other engines, it has proved beneficial to create specialised high-performance hash table classes, rather than use existing general purpose classes. There may also be some benefit in using generics. .NET generic types use type replacement, rather than type erasure, and hence, unlike Java, are able to eliminate run-time casts.

Forward Chaining

For WF Rules, we have used forward chaining as one approach to handling rule processing over sets of data. We did not need to use forward chaining for the same purpose in MS BRE. Another very typical use of forward chaining is to implement sequential flow patterns within rules. In this case, the two engines belong to opposite camps. For WF Rules, we start with a built-in foundation of purely linear sequential rule evaluation and execution. The problem is how to implement more sophisticated patterns such as loops and branches on top of the basic mechanism. For MS BRE, the issue is rather different. MS BRE is a pattern matching engine that has no equivalent concept of linear, sequential rule evaluation. We have to use forward chaining to implement the most basic of linear patterns before we are in a position to move onto loops, etc. This is both good and bad. While both engines support forms of declarative programming, MS BRE supports a more purely declarative approach. This can help to maintain a cleaner alignment between executable rule sets and the underlying business rules they represent. However, many problems can only be solved by implementing a linear sequence of rule processing steps. There may be many rules associated with any one step. MS BRE is ultimately more powerful and efficient, but the rule developer has to do more work to implement some control mechanism to support linear patterns. In turn, this can make rule sets harder to understand, debug and troubleshoot.

In one respect, forward chaining in MS BRE is significantly more powerful than the equivalent feature in WF Rules. While forward chaining can be used to iterate over data sets in WF Rules, we can apply forward chaining directly over data sets in MS BRE. A close analogy would be a database application which runs a number of queries over a group of tables, then processes the result sets. If, while processing the result sets, the application has reason to change any of the data in the database, the result sets may no longer be valid. We could immediately re-evaluate our queries and re-create our result sets before continuing. This is effectively what forward chaining does in MS BRE.

Constantly re-running a group of SELECT statements and re-creating result sets could quickly become highly inefficient. One of the best known optimisations of RETE, which we have not had reason to use in any of the tests here, is to reduce the amount of re-evaluation work during forward chaining to a bare minimum. Unlike relational query processors, a Rete system has the ability to store and remember partial query results (they are stored in the beta memories). This allows the engine to avoid complete re-evaluation of rules against all facts.

Forward chaining in a Rete system allows certain types of problem to be tackled through highly declarative rules in a way that could not be easily imagined in the WF Rules world. This is partly because of the set-based nature of Rete, and partly because of its natural efficiency when handling certain types of problem. Problems that involve ‘heavy-duty’ inference of new data based on the combination of multiple data sets are very difficult to handle in a sequential fashion, but much easier to solve elegantly and efficiently with an engine like MS BRE.

Upload large rule sets

I should report an issue I had uploading rules to repositories. I experienced considerable difficulties in importing larger rule sets into the WF Rules repository. Each time I attempted to use the RuleSetTool to save a large rule set, I got an OutOfMemoryException. I eventually abandoned using the current beta of the Externalised RuleSet Toolkit for this purpose, and wrote my own code to upload rule sets.

Whatever the reason for this problem, I suspect the root cause is the size of the XML strings that are being imported. WF Rules represents rule sets use a highly verbose and bulky XML representation. For tests 2 and 3, the import file for 10,000 rules was 40 MB, compared to 12 MB for the equivalent MS BRE files. That is over 4K per rule.

Conclusions

A fairly clear picture emerged during testing. When using rule engine caching, MS BRE provides consistently better performance than WF Rules. It also scales more effectively when executing large rule sets.

WF Rules offers a performance advantage over MS BRE in respect to ‘first hit’ execution for small rule sets. It is rare for this to be a major consideration in BizTalk development, but may indicate that WF Rules is the better choice, in terms of performance, for certain applications.

It is impossible to embark on this kind of testing without some prejudice. Although I have invested considerable effort in the last couple of years to understanding Rete engines, I’d like to record here that my expectation was that I would conclude by recommending the use of WF Rules in BizTalk development for a large range of real-world scenarios. I suspected that WF Rules would handle small, simple rule sets more efficiently than MS BRE. I was quite wrong, and so my recommendation, purely from the perspective of performance, is that BizTalk developers should continue to use MS BRE.

There are many other valid perspectives! Performance is just one of several considerations. The central value proposition of a rules engine is its ability to externalise executable rule-based policies that are strongly and cleanly aligned to the underlying business rules. In this respect, rule expressivity and tooling are vital considerations, and one reason why the Externalised RuleSet Toolkit is such a welcome addition for WF Rules. Currently, I suggest that MS BRE has the better tooling. Also, because it is set-based, it is more expressive. However, both engines frankly leave a lot to be desired in these areas.

As far as performance is concerned, it is important to maintain some perspective with regard to the results. Many rules engine enthusiasts (I suppose I must consider myself one) arguably have a tendency to over-emphasise performance above almost all other considerations, and I freely admit I am guilty of this simply by writing this article. Astute BizTalk developers may well ask themselves whether they really care about sub-microsecond differences in performance for small, simple rule sets. Of course, in certain very high-throughput scenarios, every sub-millisecond counts. However, in the majority of BizTalk scenarios, the performance overhead of the rules engine will not be a cause for concern. One of the interesting conclusions to be drawn from this study is that rules engines do not necessarily introduce significant additional overhead into a solution. When I undertook some BizTalk Server performance testing a year ago, I was surprised to find that adding rule processing into orchestrations that took several hundred milliseconds to complete their work appeared to make no measurable difference. I now understand why.

With this in mind, I continue to believe that the inclusion of WF Rules as a ‘built-in’ feature of WF is a very welcome move by Microsoft, and I look forward to the day that BizTalk orchestration will get the direct benefit of this. It is good, though, to understand the limitations of the technology. I would be concerned if, at some future date, Microsoft dropped their support for the Rete approach. There are scenarios where the performance benefits, and the greater expressivity of MS BRE rules, are of tangible and significant benefit, and it would a backward step if we lost this.

I have to conclude that MS BRE is not as well optimised as it could (and should) be. Although it consistently out-performs WF rules (with rule engine caching), and scales more effectively than WF Rules when executing large rule sets, it cannot compete with the very best that Rete engines have to offer. This is surely an opportunity for improvement. I can’t see any good reason why the engine cannot be better optimised in a future release. The claim has been made in a couple of places that the performance benefits of MS BRE over WF Rules are chiefly realised when it can optimise shared conditions across many rules. In reality, MS BRE does not always exploit node sharing where it should, but still performs better than WF Rules. There are also improvements to be made by using node hashing in the alpha network and improving indexing over beta memories. To all of this, Microsoft could also add greater expressivity by introducing support for quantification, including basic existential negation which has been a common feature of other Rete engines for the last twenty five years. Quantifications are set-based features which could not be implemented in WF Rules.

One final observation. Several months ago, Microsoft announced its ‘Business Process Alliance’. The alliance was born of an acceptance that Microsoft’s own product catalogue does not currently support the full spectrum of business process management needs. In its own press releases, Microsoft identifies business process modelling and analysis, business rules management, human-centric workflow, process simulation and service-oriented architecture life-cycle management as areas where the alliance can help extend the cost-effective reach of their platform to a wide range of customers, including SMEs and large enterprises. Of the ten companies that make up the alliance, no less than three focus on business rule management. These are Fair Isaac, InRule and RuleBurst . I mention this, not because I feel any need to provide a free ‘plug’ for these vendors, but because some readers may want to follow Microsoft’s lead and give serious attention to considering the purchase of full-blown rule management and decisioning systems from other vendors as an alternative to Microsoft’s own rule engine offerings. In the spirit of vendor neutrality, do please also consider other vendors such as ILog, Haley, Corticon, PegaSystems etc.

APPENDIX

Detailed Results All times are given in milliseconds.

Test 1

WF Rules			No Rule Engine Caching				Rule Engine Caching
			Fwd Chain		Sequential		Fwd Chain	Sequential
Rule set Size	XML Size	Load	Init	Execute	Init	Execute	Execute	Execute
1	2,402	105.89	7.81	0.36	7.61	0.32	0.02	0.02
5	11,010	115.58	7.72	0.45	11.78	0.36	0.05	0.05
10	21,772	133.83	8.07	0.44	7.60	0.44	0.08	0.09
50	107,932	341.06	10.12	0.94	7.91	0.71	0.41	0.41
100	215,634	561.06	14.50	1.37	9.28	1.02	0.73	0.75
250	539,034	1098.50	31.86	2.60	12.88	2.36	1.90	1.90
500	1,078,034	1996.09	83.59	4.46	15.69	4.74	3.75	3.74
1000	2,156,036	3972.08	269.31	8.75	24.78	8.08	7.42	7.82
2500	5,393,036	9688.36	1594.03	20.47	51.48	19.41	20.40	19.51
5000	10,788,036	19429.09	5929.27	41.04	100.46	54.43	40.44	39.20
10000	21,578,038	39052.10	25574.20	84.54	204.86	94.96	85.68	87.24

MS BRE			No Rule Engine Caching				Rule Engine Caching
			Side Effects		Value Caching		Side Effects	Value Caching
Rule set Size	XML Size	Load	Init	Side effects	Init	Value caching	Side effects	Value caching
1	1,272	213.58	68.17	17.85	81.22	19.15	0.02	0.02
5	3,620	213.85	70.11	17.56	75.33	19.93	0.02	0.02
10	13,359	217.66	69.93	18.25	70.23	18.23	0.04	0.02
50	64,199	215.62	67.88	17.76	69.80	17.88	0.12	0.03
100	127,753	229.53	68.93	17.91	72.40	17.79	0.23	0.04
250	319,003	262.54	72.73	18.22	74.39	17.79	0.64	0.07
500	295,968	305.27	78.93	19.26	81.80	19.01	1.14	0.14
1000	591,470	411.87	93.20	20.29	103.82	18.40	2.35	0.24
2500	1,480,970	689.49	137.90	24.54	140.54	19.80	6.65	0.81
5000	2,963,470	1194.16	245.57	31.66	247.60	21.97	13.74	3.40
10000	5,928,472	2198.39	458.05	46.54	450.06	30.19	27.02	6.55

Test 2

WF Rules			No Rule Engine Caching				Rule Engine Caching
			Fwd Chain		Sequential		Fwd Chain	Sequential
Rule set Size	XML Size	Load	Init	Execute	Init	Execute	Execute	Execute
1	4,424	104.57	7.28	0.48	8.45	0.45	0.04	0.03
5	21,116	126.80	7.83	0.57	7.35	0.43	0.07	0.06
10	41,985	165.30	8.38	0.55	7.48	0.47	0.11	0.10
50	209,065	448.62	11.66	0.98	8.95	0.82	0.42	0.42
100	417,919	785.10	17.45	1.51	10.57	1.25	0.80	0.79
250	1,045,069	1740.35	46.54	2.79	15.67	2.58	1.98	2.09
500	2,090,319	3381.03	135.30	4.75	23.90	4.34	4.31	3.97
1000	4,180,823	6778.99	466.34	9.49	45.40	8.85	8.78	7.86
2500	10,458,323	16943.73	2569.92	22.43	105.52	23.02	22.02	20.38
5000	20,920,823	33455.10	10163.18	43.12	200.67	43.89	43.88	42.42
10000	41,845,827	68310.73	45099.25	93.09	436.44	86.68	88.99	85.29

MS BRE			No Rule Engine Caching				Rule Engine Caching
			Side Effects		Value Caching		Side Effects	Value Caching
Rule set Size	XML Size	Load	Init	Side effects	Init	Value caching	Side effects	Value caching
1.00	1,932	207.19	69.08	18.64	66.68	17.67	0.02	0.02
5.00	6,916	212.86	70.15	21.63	67.54	20.77	0.03	0.02
10.00	13,149	219.46	68.54	17.82	70.83	17.74	0.04	0.02
50.00	63,149	234.62	70.01	17.83	70.13	17.72	0.13	0.03
100.00	125,653	260.29	72.55	17.92	72.43	17.88	0.24	0.04
250.00	313,753	330.58	87.90	19.13	86.84	18.19	0.58	0.08
500.00	627,253	438.70	101.88	19.37	96.04	18.03	1.14	0.14
1000.00	1,254,257	672.79	131.11	21.47	120.90	21.60	2.21	0.26
2500.00	3,141,257	1362.24	231.66	25.56	225.48	20.12	5.85	0.83
5000.00	6,286,257	2455.38	369.57	34.25	368.62	22.03	13.59	3.28
10000.00	12,576,261	4728.92	764.67	49.67	735.53	26.38	27.61	7.48

Test 3

WF Rules			No Rule Engine Caching				Rule Engine Caching
			Fwd Chain		Sequential		Fwd Chain	Sequential
Rule set Size	XML Size	Load	Init	Execute	Init	Execute	Execute	Execute
1	4,418	103.06	7.20	0.45	6.67	0.37	0.03	0.03
5	21,086	126.46	7.85	0.60	7.20	0.56	0.18	0.15
10	41,925	160.46	8.10	0.78	7.55	0.68	0.32	0.30
50	208,765	416.48	10.96	2.03	8.87	1.95	1.79	1.54
100	417,319	738.50	15.56	3.68	9.96	3.47	3.46	3.18
250	1,043,569	1694.63	43.31	8.76	14.57	8.14	8.39	7.66
500	2,087,319	3313.06	131.50	16.98	23.55	15.87	16.67	15.57
1000	4,174,823	6427.13	435.30	32.31	44.25	31.35	34.07	32.01
2500	10,443,323	15847.19	2312.98	80.95	86.64	97.85	84.82	79.79
5000	20,890,823	31914.42	9425.66	162.53	176.32	183.99	173.31	161.12
10000	41,785,827	63386.70	39859.37	325.40	386.17	363.34	347.86	336.16

MS BRE			No Rule Engine Caching				Rule Engine Caching
			Side Effects		Value Caching		Side Effects	Value Caching
Rule set Size	XML Size	Load	Init	Execute	Init	Execute	Execute	Execute
1.00	1,953	204.67	67.75	17.96	67.20	17.68	0.02	0.02
5.00	7,021	206.02	75.63	21.25	70.58	24.60	0.06	0.04
10.00	13,359	211.93	67.73	18.14	68.23	17.69	0.12	0.07
50.00	64,199	232.36	69.35	18.25	70.92	18.17	0.55	0.23
100.00	127,753	254.97	71.94	18.96	75.19	18.99	1.11	0.46
250.00	319,003	314.20	83.75	20.64	84.64	18.94	2.77	1.11
500.00	637,753	420.66	95.63	24.30	95.74	20.91	5.61	2.38
1000.00	1,275,257	623.22	119.05	34.10	121.82	25.36	12.42	5.14
2500.00	3,193,757	1261.39	229.47	50.27	234.66	35.75	31.56	15.79
5000.00	6,391,257	2339.63	364.09	129.91	380.21	104.14	64.20	31.73
10000.00	12,786,261	4594.90	768.03	191.57	785.00	138.45	130.83	64.63

Test 4

WF Rules			No Rule Engine Caching				Rule Engine Caching
			Fwd Chain		Sequential		Fwd Chain	Sequential
Rule set Size	XML Size	Load	Init	Execute	Init	Execute	Execute	Execute
1	3,020	115.49	11.35	10.01	11.61	10.14	9.94	9.95
5	14,132	120.34	11.87	50.10	11.59	49.58	50.40	49.99
10	28,024	147.94	14.06	99.85	13.35	99.75	99.98	99.99
50	139,224	355.73	23.87	500.06	15.51	500.13	499.96	500.27
100	278,226	582.82	20.76	1000.14	13.65	1000.32	1000.27	1000.26
250	695,526	1287.95	42.10	2502.99	17.13	2500.54	2500.36	2501.14
500	1,391,026	2515.17	113.48	5004.97	23.66	5001.33	5001.90	5001.50
1000	2,782,028	4884.49	349.79	10014.67	42.01	10011.31	10014.97	10003.37
2500	6,958,028	12269.53	1876.18	25042.29	91.85	25051.19	25031.44	25013.04
5000	13,918,028	24201.65	7114.99	50072.73	157.65	50081.85	50349.69	50024.06
10000	27,838,030	47255.89	29558.89	100075.27	383.70	100212.91	100403.67	100030.32

MS BRE			No Rule Engine Caching				Rule Engine Caching
			Side Effects		Value Caching		Side Effects	Value Caching
Rule set Size	XML Size	Load	Init	Execute	Init	Execute	Execute	Execute
1.00	1,405	209.23	68.44	27.71	68.10	27.85	9.99	9.99
5.00	4,317	208.09	67.62	66.93	68.11	27.66	50.00	9.99
10.00	7,958	214.85	67.86	117.27	68.36	27.27	100.00	9.99
50.00	37,158	224.89	69.68	517.74	70.44	27.34	500.01	9.98
100.00	73,660	239.21	71.82	1017.75	72.14	27.47	1000.26	10.01
250.00	183,460	279.62	76.84	2519.82	76.92	28.07	2501.74	10.02
500.00	366,460	340.79	86.30	5022.03	89.27	28.25	5044.74	10.01
1000.00	732,462	486.51	107.89	10025.32	107.61	28.39	10046.16	10.00
2500.00	1,833,462	847.46	165.12	25051.66	168.53	30.00	25046.43	11.50
5000.00	3,668,462	1506.61	297.42	50090.70	301.11	32.72	50028.86	13.67
10000.00	7,338,464	2893.36	651.13	100687.84	615.94	38.58	100095.52	19.46

	MS BRE
	Node Sharing
Rule set Size	Execute
1.00	9.80
5.00	9.86
10.00	9.97
50.00	9.96
100.00	9.91
250.00	9.91
500.00	9.94
1000.00	9.87
2500.00	9.98
5000.00	9.92
10000.00	9.95

Test 5

Number of POs	WF Rules		MS BRE
	Fwd Chain	Sequential	Side Effects	Value Caching
	Execute	Execute	Execute	Execute
1	0.01	0.03	0.01	0.01
5	0.06	0.11	0.05	0.04
10	0.12	0.22	0.09	0.08
50	0.57	1.12	0.43	0.38
100	1.17	2.17	0.84	0.73
250	3.12	5.28	2.16	1.83
500	5.87	10.67	4.28	3.60
1000	11.52	21.28	10.74	7.39
2500	29.65	53.84	23.94	18.78
5000	60.22	105.99	47.41	38.59
10000	119.76	213.01	95.26	82.78

Test 6

POs per Customer	WF Rules		MS BRE
	Unfiltered	Filtered	Side Effects	Value Caching	Filtered
	Execute	Execute	Execute	Execute
1	1.15	0.09	0.31	0.21	0.03
2.5	3.53	0.27	0.67	0.43	0.09
5	5.64	0.45	1.07	0.64	0.14
10	11.66	0.97	1.96	1.19	0.28
25	29.16	2.53	4.79	2.88	0.68
50	58.02	4.49	8.94	5.54	1.38
100	115.42	9.69	17.83	12.26	2.86
250	289.60	22.92	49.16	29.34	8.03
500	578.62	46.39	93.34	54.76	13.99
1000	1160.77	101.07	187.54	114.65	28.41
2500	2857.82	228.07	487.51	320.50	71.81
5000	5743.46	451.38	1018.21	680.41	141.07
10000	11466.91	906.55	2131.43	1434.98	283.75

This article is part of the GWB Archives. Original Author: Charles Young

Replatforming Guide: Pros, Cons, and Impact

Deciding to replatform is no small feat; it’s like setting sails for new horizons with your digital presence. Weighing the

Cypress vs Selenium: Why Cypress is Better!

Navigating the competitive landscape of web testing tools, Cypress emerges as a noteworthy contender, outshining Selenium with its cutting-edge advantages.

WF Rules and MS BRE – Comparing Performance

Related Posts

Replatforming Guide: Pros, Cons, and Impact

Cypress vs Selenium: Why Cypress is Better!