Charles Young

  Home  |   Contact  |   Syndication    |   Login
  109 Posts | 43 Stories | 217 Comments | 406 Trackbacks

News

MVP - Microsoft Most Valuable Professional

Article Categories

Archives

Post Categories

Image Galleries

Alternative Feeds

BizTalk Bloggers

BizTalk Sites

CEP Bloggers

CMS Bloggers

Fun

Other Bloggers

Rules Bloggers

SharePoint Bloggers

Utilities

WF Bloggers

In May 2004, Microsoft published a whitepaper entitled ‘Microsoft BizTalk Server 2004 Performance Characteristics’ (MBTS-PC04)[1]. On pages 70-73 of this paper, Microsoft included the results from tests conducted on the Microsoft Business Rule Engine (MS-BRE). The first two test results are provided to indicate the scalability characteristics of the engine for large rulesets containing between 1,000 and 10,000 rules. The first result is for facts asserted as .NET objects. The second is for facts asserted as XML DOM documents.

In both cases, the graphs show a simple linear correspondence between the ruleset size and the time taken to complete the tests. This result has been publicly challenged by Peter Lin who suggested that they indicate either that Microsoft has not implemented the Rete algorithm, or that they have implemented it incorrectly. Scott Woodgate, a programme manager working on the Microsoft BizTalk Server product team, confirmed in early 2005 that the Microsoft Business Rule Framework (MS-BRF) product team had implemented the Rete algorithm in the MS-BRE. The Rete algorithm is the best-known and most widely used matching algorithm in modern forward-chaining inferencing rules engines. Peter’s responses can be read at the following locations.

http://www.theserverside.net/news/thread.tss?thread_id=30449

http://www.theserverside.net/tss?service=direct/0/PostNewsReply/postReply&sp=l36112&sp=F&sp=l184119

A brief quote will summarise Peter’s position:

“...I think Microsoft is making claims that are clearly untrue. If the biztalk (sic) rule engine really implemented RETE algorithm as Dr. Forgy described in his two papers, the performance characteristics would not show a dramatic decrease in performance as the number of rules increase…RETE when implemented correctly will not show a 2x decrease in performance as the rule count increases 2x…None of the existing RETE engines I know of show such poor performance with respect to ruleset size.”

Here is the graph for the .NET object results, as published in MBTS-PC04. They indeed show the engine taking twice as long to complete execution of a policy (an executable ruleset) each time the number of rules are doubled.

While this graph appears to show a classic linear scaling result, Peter is entirely correct in saying that the Rete algorithm is capable, in most true-life scenarios, of providing a much better performance characteristic as the rule set size is scaled. There are, however, several factors to consider when predicting the performance of Rete. The algorithm uses approaches such as node sharing and partial match retention to optimize performance, and these, in turn, depend on the details of the rules contained in a rule set.

The Microsoft Test
Microsoft does not provide a full description of the test they carried out, but there is enough information to adequately re-construct its main details. Here is their description of the test:

The policy used in this test contains rules like the following:
      If ( ClassN.Attribute >= constant )
           Then {Do something}

Note   Different .NET classes were used for the rules in the policy.
The application used to run the test executes the policy from four different threads.

The first point to note is that the rules are very simple. They contain a single condition which tests a value against a constant. Because the rule is so simple, it does not exploit all the features of the Rete. Specifically, a rule set which contains only rules which follow the above pattern will result in the Rete containing a collection of ‘alpha’ (1-input) nodes, but no beta (2-input) nodes. Hence, the partial match retention optimizations (associated with beta nodes) of the Rete will not apply.

Microsoft does not inform us of the action they take in the action list. Given the obvious simplicity of the rule pattern, it is reasonable to speculate that the rules are not performing any assertions, retractions or updates. If this is correct, there is no forward-chaining, and hence alpha-memory optimizations of the Rete are not exploited.

Microsoft provides some other relevant pieces of information about their test. They tell us that “different .NET classes were used for the rules in the policy” which probably means that they asserted facts of more than one type. This appears to be supported by a careful study of the example code provided on page 70 showing how the policy was executed.

The Policy object model was used in all test scenarios in this section. The policy execution was initiated from multiple threads, each thread using a unique Policy object instance; for example:

Thread n
Policy PolObj = new Policy(REFacts);
for (int i=0; i < NIterations; i++)
PolObj.Execute (REFacts);
PolObj.Dispose();

This example code was clearly not copied from the real test code as it contains obvious errors. The identifier ‘REFacts’, which presumably is the list of facts asserted initially to the engine, is used incorrectly to represent the name of policy in the Policy constructor, and again, correctly, as the argument to the Execute method. The Execute method can take either a single object or an array of objects representing one or more initial facts. Microsoft has used a plural form in the identifier (‘REFacts’) which may indicate that they asserted an array of multiple facts, presumably of different types, to the engine.

The code contains a ‘for’ statement with no body. It is likely that Microsoft meant to indicate that they executed the same policy a number of times as part of the test run. In MS-BRE, the Policy class represents an executable rule set where the rules have been translated into a run-time representation (in this case, a Rete (network of in-memory ‘nodes’). The Policy object caches this state. You can then execute the same policy as many times as you wish, asserting a new set of initial facts each time. In this case, the Rete is maintained, but the working memory is automatically re-initialised for each iteration of the Execute method.

One other point to note is that the test was run in a multi-threaded fashion. It appears that each thread maintained its own Rete via a thread-specific instance of the Policy object.

Explaining the results
As has been noted, and given that the assumptions about the test are broadly correct, the extreme simplicity of the rules used by Microsoft meant that their test appears to have made little or no use of the various optimizations offered by the Rete algorithm. Because no joins are performed, the Rete does not contain beta nodes and therefore does not retain partial matches at join nodes. Assuming that no assertions, retractions or updates were performed by any rule, the benefits of alpha memory in a forward-chaining scenario were not exploited. We cannot tell if there was any node sharing amongst the alpha nodes, but it is quite reasonable to speculate that Microsoft used a different constant value in each rule, in which case the Rete would have contain one alpha node for each rule. In this situation, the algorithm would perform in exactly the fashion described in Microsoft’s results.

Testing in other engines
In order to show that the performance results published by Microsoft are to be expected, it is necessary to perform similar tests against other Rete-based rules engines. Two such engines have been selected. Both engines are Java-based. One (Jess) is generally accepted as implementing the Rete algorithm correctly. The other (Drools) implements an algorithm called Rete-OO which is based on Rete. Peter Lin has claimed elsewhere that Drools does not implement the Rete algorithm entirely correctly. Specifically, I understand that its implementation of alpha memory is deemed incorrect, and that Peter has donated an alternative implementation to the Drools project which is currently being integrated into the engine. However, because Microsoft’s test does not exploit optimisation benefits of alpha memory, it is reasonable to speculate that an equivalent test conducted using Drools is likely to provide similar scalability characteristics to the other engines.

Further information on all three rules engines can be found at the following sites:

http://www.drools.org/

http://herzberg.ca.sandia.gov/jess/index.shtml

http://msdn.microsoft.com/biztalk/

The approach taken was to emulate the Microsoft test in its main details, in according to the above description and assumptions. The original test was emulated in all aspects except the multiple-threaded nature of the Microsoft test. My main justification for not implementing a multi-threaded approach is that it is entirely reasonable to expect the use of threading, as described by Microsoft, to possibly decrease the overall elapsed time of the test, but not in any way to change the scaling characteristics as the number of rules is increased. I am not interested in performing any kind of general speed comparison between the engines, and, in any case, there are far too many unknowns in the Microsoft test to allow this kind of comparison. We do not know how many facts were asserted, how many fact types were used or how many matches were achieved. We are not told how many iterations of the policy execution were undertaken. We don’t know if the results are the total elapsed time over these iterations or average elapsed time. We do not know what action each rule undertook and, lastly, we do not know if Microsoft optimized the engine settings in any way.

Each engine provides an API for executing rule sets. Microsoft provides very similar functionality at the API level to the other engines, but also provides an additional higher-level abstraction based on the concept of business ‘policies’. A policy is an executable rule set, and contains an instance of a rule engine. In addition, the Policy class is designed to work with the Ruleset Update service which is part of Microsoft’s Business Rules framework, and which decouples engines (‘executors’) from rule repositories whilst managing the publication and updating of rule sets. Microsoft’s original tests appear to have used this higher-level abstraction. In order to ensure the closest correspondence between the use of each engine, the Microsoft BRE test described here used the RuleEngine class directly. However, in order to emulate the original Microsoft tests reasonably closely, an ‘execute’ method was written for each engine, and implemented to assert the initial facts, execute the algorithm over those facts and then clear the working memory. Elapsed times were calculated within the ‘execute’ methods. The inclusion of elapsed time for fact assertions is vital to the test, as it is at this point that the facts are evaluated via the Rete. The inclusion within the elapsed time of the code that resets the working memory is more controversial, and is done simply to emulate the original Microsoft test more closely (the Policy class does this internally, although it uses Retract() rather than .Clear()). I conducted additional checks to ensure that including the resetting of working memory did not significantly distort or alter the scalability results for any of the engines. Here are the three ‘execute’ methods.

  • Jess (Java)

public void execute( ) throws JessException
{
  // Start measuring elasped time
  long start = System.currentTimeMillis( );

  // Assert the initial facts
  for ( Iterator i = this.inputObjects.iterator( ); i.hasNext( );
  {
    rete.assertFact( (Fact)i.next() );
  }

  // Run the test
  rete.run();

  // Retract facts from working memory
  for ( Iterator i = this.inputObjects.iterator( ); i.hasNext( );)
  {
    rete.retract( (Fact)i.next() );
  }

  // Stop measuring elapsed time
  timeTotal += (System.currentTimeMillis( ) - start);
}

  • Drools (Java)

public void execute( ) throws FactException
{
  List factHandles = new ArrayList();

  // Start measuring elasped time
  long start = System.currentTimeMillis( );

  // Assert the initial facts
  for ( Iterator i = this.inputObjects.iterator( ); i.hasNext( );
  {
    factHandles.add ( this.workingMemory.assertObject( i.next( ) ) );
  }

  // Run the test
  this.workingMemory.fireAllRules( );

  // Retract facts from working memory
  for ( Iterator i = factHandles.iterator( ); i.hasNext( ); )
  {
    this.workingMemory.retractObject( (FactHandle)i.next( ) );
  }

  // Stop measuring elapsed time
  timeTotal += (System.currentTimeMillis( ) - start);
}

  • Microsoft BRE (C#)

public void Execute( )
{
  // Start measuring elasped time
  long start = DateTime.Now.Ticks;

  // Assert the initial facts
  foreach ( object fact in this.inputObjects)
  {
    ruleEngine.Assert( fact );
  }

  // Run the test
  ruleEngine.Execute();

  // Retract the facts from working memory
  foreach ( object fact in this.inputObjects)
  {
    ruleEngine.Retract( fact );
  }

  // Stop measuring elapsed time
  long end = DateTime.Now.Ticks;

  // Convert ticks to milliseconds
  timeTotal += Convert.ToInt64(
             new System.TimeSpan(end).TotalMilliseconds
           - new System.TimeSpan(start).TotalMilliseconds);
}

Note that the Microsoft BRE code contains additional logic to convert ‘ticks’ (100-nanosecond intervals) into milliseconds (yes, I know I could have simply divided by 10,000, but this is the ‘approved’ approach).

For any Microsoft Rule Engine users reading this, it may be worth recording that the approach taken here proved about 15% more optimal than using the Execute method of the Policy class.

Each test run for each engine tested a rule set of a given size. The code performed an initial four executions in order to ensure the engine was ‘warmed up’. This was probably unnecessary, but is generally good practice in this kind of testing. Thereafter, a further ten executions were performed against the single rule set, and the average elapsed time was measured and recorded. This was repeated for 10 different rule sets ranging from 1,000 to 10,000 rules. The entire test was then repeated ten times, and the average result for each rule set was calculated to provide the final result. Hence, each rule set was executed a total of 100 times

Rules and Facts

The rule sets were generated to contain a given number of simple tests. Each rule followed the same pattern, as follows:

  • Jess

(defrule rule0
    (theFact1 ( value 0))
    =>
    (call solidsoft.jess.example.simplerule.TestFact1 logResult 0)
)

  • Drools

<rule name='is0'>
    <parameter identifier='theFact1'><class>TestFact1</class></parameter>
    <java:condition>theFact1.getValue() == 0</java:condition>
    <java:consequence>
        theFact1.logResult( 0 );
    </java:consequence>
</rule>

  • Microsoft BRE

<rule name="rule0" priority="0" active="true">
  <if>
    <compare operator="equal">
      <lhs>
        <function>
          <classmember classref="theFact1" member="get_Value" 
                       sideeffects="false" />
        </function>
      </lhs>
      <rhs>
        <constant>
          <int>0</int>
        </constant>
      </rhs>
    </compare>
  </if>
  <then>
    <function>
      <classmember classref="theFact1" member="LogResult" 
                   sideeffects="true">
        <argument>
          <constant>
            <int>0</int>
          </constant>
        </argument>
      </classmember>
    </function>
  </then>
</rule>

The rule shown above can be expressed as follows:

If ( theFact1.getValue == 0 ) Then theFact1.logResult( 0 );

…where theFact1 is an instance of TestFact1

The constant value used in the condition was incremented by one for each generated rule.

Note that for the Microsoft BRE, the sideeffects flag as been set to ‘false’ within the condition. This was done in order to ensure that the three tests more closely corresponded. The side effects feature has been the subject of some discussion between Peter Lin and myself, and Peter makes some good points with regards to its lack of safety. Interestingly, switching side effects off for facts expressed as .NET objects proved to be a significant optimisation.

Five fact classes were created (TestFact1, TestFact2, etc). The generated rule set selected one of the five classes in each rule, with each class used in a round-robin fashion across the rule set (i.e., rule 0 used TestFact1, rule 1 used TestFact2, etc, with rule 5 using TestFact1 and starting a new ‘round’).

The logResult() method did nothing. Originally it logged a message to a text file. This additional work made no difference to the overall scalability comparisons, but constituted additional work which I removed in order to simplify the test as much as I could.

A .dat file was created to define 100 facts for initial assertion to the engine. An extract is shown below

(make theFact1 ^value 0)
(make theFact2 ^value 101)
(make theFact3 ^value 202)
(make theFact4 ^value 303)
(make theFact5 ^value 404)
(make theFact1 ^value 505)
(make theFact2 ^value 606)
(make theFact3 ^value 707)
(make theFact4 ^value 808)
(make theFact5 ^value 909)

The same .dat file was used for each test. The values were deliberately set in order to maximize the number of match ‘hits’. Note that although the same identifier (e.g., ‘theFact1’) appears many times in the .dat file, this is not problem. The code creates and asserts multiple instances of the same class (e.g., TestFact1).

Results
The results of the test are shown below. Note that I took the liberty of adjusting the figures in order to show ‘time units’ in the y-axis, rather than actual milliseconds. This is because the purpose of this test is not to compare raw performance between the three engines, but to investigate the scaling characteristics of each engine when performing a similar test to the one described by Microsoft in their whitepaper. In order to emphasis this, I used the simple approach of calculating the average time increment for each engine as the rule set was increased in size by 1,000 rules, and then dividing the millisecond value by this average. The results allow the scaling characteristics to be more directly compared. Note that I have included the true millisecond values in appendix E, together with some brief comments on performance.

It can be seen immediately that the graph lines for each of the engines exhibit the same linear correspondence as seen in Microsoft’s results. The graph provides direct and compelling evidence that Rete-based engines, including engines known to implement the algorithm correctly, will exhibit the results Microsoft published when used to evaluate the extremely simple rules used within Microsoft’s test.

I hope, with the greatest respect to Peter, that these results will be accepted as compelling evidence that his assertions with regards to Microsoft’s test results are groundless. It is important to note that these figures in no way prove the correctness of Microsoft’s implementation of the Rete algorithm. They do, however, show that Microsoft’s published results provide no evidence whatsoever of an incorrect implementation of the algorithm.

Further Evidence
MBTS-PC04 provides some evidence to support the assertion that the Rete algorithm is being used with the MS-BRE. On pages 72 and 73, Microsoft reports the result of one of their tests under the title “Increasing the Number of Conditions”. This specific test is one of a series of tests investigating performance in relation to the use of facts asserted as ADO.NET datasets where a connection to the underlying data source is exploited. Microsoft explains the test as follows:

This test used a policy that contained 100 rules, and changed the number of unique conditions in the rules.

Rule N

If (Table1.Column1 = (N Mod NumberofUniqueConditions)
Then Class1.Property1 = Table1.Column1

The purpose of the test was to show how the data connection binding becomes much more efficient when the rule set uses shared conditions/queries. The following graph shows the results of the test.

This graph shows a very significant increase in performance as the number of ‘unique queries’ contained in the conditions of the rules is reduced to 1. This is clear evidence to support the claim that the engine is using node sharing optimisations. Whilst this result does not prove a correct implementation of the Rete algorithm, it is what would be expected if Rete is correctly implemented.

Conclusion
In contrast to Peter’s assertions, the performance results published by Microsoft do not provide any evidence that their implementation of the Rete algorithm is incorrect, but do provide evidence to support the assertion that features such as node sharing are working as expected. It is, perhaps, to be regretted that Microsoft’s published figures do not provide better evidence of the benefits of using the Rete algorithm in the match phase of the match-resolve-act cycle.


Appendix A

Source code - Jess

The TestSimpleRulePerformance class performs the test. Pass the name of the .CLP file as the first command line parameter. You can also create rule sets of a given size by providing an integer as the first command line parameter.

/**
* A simple example of using 'store' and 'fetch'
*/

package solidsoft.jess.example.simplerule;
import jess.*;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.StringWriter;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.StringTokenizer;

public class TestSimpleRulePerformance
{
  /**
  * The DAT file defines the facts that will be initially asserted to the engine
  */
  private static final String DAT_FILE = "TestSimpleRulePerformance_100.dat";

  /**
  * The LOG file captures the average elapsed times durign a test run.
  */
  private static final String LOG_FILE = "TestSimpleRulePerformance.log";

  /**
  * The DRL file contains the ruleset. Set to the default ruleset.
  */
  private static String CLP_FILE = "TestSimpleRulePerformance.clp";

  /**
  * List of SolidSoft.jess.example.simplerule.* objects
  * read from the provided *.dat data file and to be asserted into the
  * WorkingMemory.
  */
  private List inputObjects;

  /**
  * The total elapsed time
  */
  private static long timeTotal = 0;

  /**
  * Executes the built Rete network
  */
  private static Rete rete = null;

  /**
  * Holdes the size of the current ruleset
  */
  static long ruleSetSize = 0;

  public static void main(String[] args)
      throws IOException,
      JessException
  {
    String datFile = DAT_FILE;
    int ruleCount = 1000;
    boolean doCreateRules = false;

    if ( args.length > 0 )
    {
      try
      {
        ruleCount = Integer.parseInt( args[0] );
        doCreateRules = true;
      }
      catch(Exception ex)
      {
        // not an integer, so we assume it is the name of the .clp file;
        CLP_FILE = args[0];
        ruleSetSize = Integer.parseInt( args[1] );
      }
    }

    if (doCreateRules)
    {
      createRules(ruleCount);
    }
    else
    {
      TestSimpleRulePerformance testPerf = new TestSimpleRulePerformance( datFile );

      // Warm up - execute the ruleset a few times
      for (int wuIdx = 0; wuIdx < 4; wuIdx++)
      {
        testPerf.execute( );
      }

      // Execute the ruleset several times more and take the average time
      int exCount = 10;

      timeTotal = 0;

      for (int exIdx = 0; exIdx < exCount; exIdx++)
      {
        testPerf.execute( );
      }

      String elapsedTimeMsg = "" + ( timeTotal / exCount );
      System.out.println( elapsedTimeMsg );

      String commaSep = ", ";
      File logFile = new File( LOG_FILE );
      FileOutputStream fsOut = new FileOutputStream( logFile, true );
      fsOut.write( elapsedTimeMsg.getBytes( ) );
      if (ruleSetSize == 10000)
      {
        fsOut.write( System.getProperty( "line.separator" ).getBytes( ) );
      }
      else
      {
        fsOut.write( commaSep.getBytes( ) );
      }
      fsOut.close();
    }
  }

  /**
  * Read the CLP and DAT files
  *
  * @param datFile the input data file
  */
  public TestSimpleRulePerformance( String datFile )
        throws IOException,
        JessException
  {
    System.out.println( "Loading CLP: " + CLP_FILE + "..." );

    rete = new Rete();

    // Read in the rules
    rete.executeCommand("(batch " + CLP_FILE + ")");
    rete.executeCommand("(reset)");

    System.out.println( "Reading DAT: " + datFile + "..." );
    this.inputObjects = getInputObjects( this.getClass( ).getResourceAsStream( datFile ) );

  }

  /**
  * Attempts to closely mimic Microsoft's Execute method on an MS-BRE Policy object by
  * 'reseting' the working memory, and then asserting the initial objects and firing
  * all rules.
  */
  public void execute( )
  throws JessException
  {
    // Start measuring elasped time
    long start = System.currentTimeMillis( );

    // Assert the initial facts
    for ( Iterator i = this.inputObjects.iterator( ); i.hasNext( );)
    {
      rete.assertFact( (Fact)i.next() );
    }

    // Run the test
    rete.run();

    // Retract facts from working memory
    for ( Iterator i = this.inputObjects.iterator( ); i.hasNext( );)
    {
      rete.retract( (Fact)i.next() );
    }

    // Stop measuring elapsed time
    timeTotal += (System.currentTimeMillis( ) - start);
  }

  /**
  * Creates a rule set with a given number of simple rules
  */
  private static void createRules(int ruleCount)
        throws FileNotFoundException,
        IOException
  {
    final String LINE_SEPARATOR = System.getProperty( "line.separator" );

    StringWriter writer = new StringWriter( );

    writer.write( "(defclass theFact1 SolidSoft.jess.example.simplerule.TestFact1)" + LINE_SEPARATOR );
    writer.write( "(defclass theFact2 SolidSoft.jess.example.simplerule.TestFact2)" + LINE_SEPARATOR );
    writer.write( "(defclass theFact3 SolidSoft.jess.example.simplerule.TestFact3)" + LINE_SEPARATOR );
    writer.write( "(defclass theFact4 SolidSoft.jess.example.simplerule.TestFact4)" + LINE_SEPARATOR );
    writer.write( "(defclass theFact5 SolidSoft.jess.example.simplerule.TestFact5)" + LINE_SEPARATOR + LINE_SEPARATOR );

    int factTypeNo = 0;

    // Output the rules
    for ( int idx = 0; idx < ruleCount; idx++ )
    {
      if (factTypeNo == 5) factTypeNo = 0;
      factTypeNo++;

      writer.write( "(defrule rule" + idx + LINE_SEPARATOR );
      writer.write( "(theFact" + factTypeNo + " ( value " + idx + "))" + LINE_SEPARATOR );
      writer.write( "=>" + LINE_SEPARATOR );
      writer.write( "(call SolidSoft.jess.example.simplerule.TestFact1 logResult " + idx + ")" + LINE_SEPARATOR );
      writer.write( ")" + LIN