Geeks With Blogs

News
Charles Young

Recently read that the RETE algorithm developed by Charles Forgy is in its third release. Curious to know if RETE III is open source?

 

No.  It's proprietary.  I guess it’s how Charles Forgy makes his money.  The current owner is Fair Isaac.

 

Curious to understand in general why even the very large analyst firms don't have deep coverage of the rules space? There are at least thirty vendors in this segment and the rules-based approach can have a big potential in making a huge difference in how enterprises can achieve agility.

 

The difficulties have always been in working out how to unlock this potential.   Progress is being made, but the industry has a long way to go.   Capturing and defining business rules in a meaningful, correct and rigorous fashion still tends to be a specialist discipline poorly understood by many analysts, and often with little buy-in from IT managers who have yet to be convinced that this will yield the benefits they are looking for.   There are lots of reasons why this is the case, including the lack of adequate analyst skills in this area, the lack of standard (or even adequate) formal mechanisms for representing and exchanging business semantics and constraints and the impedance between analytical formalisation of business rules and technical implementation of rules in production systems and other types of rule engine.

 

In the industry, there are what are known as Rating Engines. Some are of the belief that these are distinct from rules engines while others simply think of them as specializations. Who is correct and why? It would be wonderful if a rules vendor would create a "reference architecture" and make it publicly available on how to use a rules engine for rating.

 

I can't see the distinction in a general sense.   Certainly in the last couple of years I've specifically had involvement in the implementation of rating engines that were built around Rete-based production engines.   However, I'm not convinced that pattern-matching inferencing engines are always the best approach for these kinds of systems.  I worked on one system a couple of years ago which we built using process automation software, but which should probably have been built using a Rete-based production engine.   The application did a lot of combinatorial processing in order to infer a set of detailed transactions based on higher-level input.   A little while later, I had some input into another project.   They were running into some practical problems in trying to use a pattern-matching production system for what, essentially, was a big decision tree.

 

Can't seem to locate industry analyst coverage on drools. For that matter, I can't even seem to find any public benchmarks on any of the rules engines. I heard of a benchmark named the Waltz but no one seems to talk about it.

 

All the OPS5 benchmarks, including Waltz and Manners, can be downloaded from:

 

  • http://www.cs.utexas.edu/ftp/pub/ops5-benchmark-suite/
  • http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/expert/bench/bench/
  • ftp://ftp.cs.utexas.edu/pub/ops5-benchmark-suite/

 

Beware these benchmarks!   They were designed specifically for the OPS5 community.   OPS5 is written in LISP, and can therefore run on a variety of different interpreters, or compiled using different LISP compilers.   Manners is perhaps the most (ab)used of these benchmarks in performing apparently ‘comparative’ tests between different engines.   The benchmark is designed to stress the ‘beta’ (join) network in a small and simple Rete.   It chiefly stresses a couple of specialised join nodes representing the last two conditions in one of its eight rules.   The 128-guest run ends up causing the engine to perform several hundred million evaluations, the bulk of which cluster around that part of the network.

 

The problems with using the benchmark for comparative testing are many and varied.   It focuses on a narrow set of characteristics and mainly tests a specialised feature that is supported on most, but not all engines.   It therefore doesn’t provide a truly comprehensive or balanced picture of the overall performance characteristics of an engine (you really can’t sum up performance characteristics of a Rete engine using a single figure or graph).  

 

Manners is not explicit enough (for comparative testing purposes) in determining the approach to finding a solution, and the results of the benchmark, as originally written, often depend very significantly on undocumented features.   For example, on CLIPS, the test only performs (reasonably) well because of the order in which one type of ‘fact’ (seatings) is evaluated, which in turn is a co-incidental side effect of the implementation.   This order is meant to be treated as ‘arbitrary’, but Miss Manners relies (for reasonable performance) on the fact that there is a certain order.

 

Another problem is that Miss Manners, which is a highly contrived test, can far too easily be ‘cheated’.   This problem has occurred a few times in the industry.  For example, you can add one additional condition to one of the rules, at exactly the right place in the condition list, and cause the engine to reach a correct conclusion in perhaps less than a second for a test that would normally take anywhere from about 35-40 seconds to several minutes. 

 

For enterprises that have adopted the notion of an Enterprise Service Bus, it seems as if rules-based routing is starting to become popular. The extremely scalable ESB Servicemix (happens to also be open source) uses rules based routing. If the ESB product I spent lots of money on didn't come with this functionality, what is the best way to integrate it?

 

A fascinating sub-topic in its own right.  You are quite right to draw the association between routing rules and other business rule sets.   Indeed, routing rules are generally an important sub-category of business rules.  Routing rules are typically used for simple decision logic ("if this message is of type X, then deliver it to endpoint A").   Many existing products use a fairly static approach to establish and maintain rules in some kind of routing table.  Some products layer a more dynamic approach on top of this by creating lots of alternative static rules, and then selecting the appropriate one dynamically at runtime.   I've recently been looking at an emergent, lightweight ESB-like technology that uses an entirely dynamic approach to routing in which each participant service can potentially change existing routing tables within an existing session or spin up new sessions with a new set of routing rules.

 

You could certainly use production engines based on Rete to provide routing rules in an ESB environment.   I've actually done this a couple of times in more centralised hub & spoke environments.   However, I've yet to see a real-life scenario where a pattern-matching inferencing engine was really needed for this purpose.   A simple decision logic system is all that is typically required.   I don’t know Servicemix, but looking at the web site it appears it is using Drools for this purpose.

 

I would love to hear from industry analysts in terms of books that publishers such as Springer Verlag and others should be thinking about getting acquisitions editors focused on. As an author myself, I have been for the most part disappointed with pretty much all of the books in the rules space (Tony Morgan's book is the sole exception). The vast majority of the rules books are methodology books in disguise and are so general in nature. Maybe folks are eager to have a book that talks about rules architecture, but what should it contain?

 

Not an analyst, so I will pass this by.

 

Curious to know why vendors in other spaces where it is J2EE servers, Enterprise Service Bus, or Portals have assembled to work on interoperability excersises and continue to bring useful standards to the table for us but yet we haven't seen any of this in the rules space. I know it is not because they are small as other vendors who are even smaller in other categories have figured out how to participate. Is it that they don't think interoperabiity is important to us customers?

 

From my perspective, I’d say there is a fair amount of work going on in this regard, chiefly under the auspices of standards organisations.   I have mentioned a couple of endeavours below.   However, I agree that vendors have not yet achieved much yet in terms of interoperability.   It's only in the last decade or so that some really solid understanding of business rule representation has begun to emerge.   Rule processing using production systems has, over this same time, begun to move from a chiefly academic exercise in AI to a more mainstream place in enterprise systems.   However, production systems aren't that mainstream yet!   There are all kinds of reasons why, including the major problems surrounding the capture and definition of business rules and the translation of those rules into a technical representation that can be consumed by production engines.

 

If the RETE algorithm went from version II to version III, could someone tell me in terms of "design patterns" what types of applications would improve by using the newer algorithm? What types of applications would see a decrease in performance?

 

Because these 'improved' algorithms are proprietary, I can't really answer your question.  I will make an observation, though.   There is over 20 years of academic literature out there on ways in which the Rete algorithm can be optimised, and there are also well know alternatives (Treat and LEAPS, for example, both with the involvement of Daniel Miranker who, incidentally, was also one of the people responsible for the Waltz and Manners benchmarks).

 

Many optimisations have only specialist applications.   Some are more general purpose.   Add to this the observation that rule engine performance, which is surprisingly difficult to quantify in a truly meaningful fashion, can be significantly affected by apparently secondary matters such as the amount of runtime casting they do, or the exact way they test for equality.   I suspect (though I don’t know) that Rete II and Rete III don’t represent radical changes to the original algorithm, but implement a range of optimizations which allow them to perform well for certain types of problem   Certainly the sales pitch for Rete III seems to indicate that its advantage is chiefly to do with improving scalability when processing large data sets.

 

Is RDF rich enough to express and create a portable rules syntax?

 

No.   Nowhere near.

 

Portability is a surprisingly ambitious goal.   The W3C currently have a work track that is attempting to create a portable rule language, but it remains to be seen if they will produce anything of use.   The OMG is working on a standard 'production rule representation' (PRR), and this appears to have some momentum, though it is not yet published.   This is aligned to MDA.   RuleML specifically addresses issues concerning the semantic web, and will doubtless have an impact on any future W3C portable rule language.

 

The problems surrounding portability are many and varied.   First, there are some fairly fundamental differences between different rule processing styles.   For example, sequential rules are processed very differently to declarative pattern-matching rules of the sort a Rete engine consumes. Backward chaining is significantly different to forward chaining, etc., etc.  

 

Another issue is that rules are used for a wide variety of purposes.   In business systems, they are typically used for decision logic and workflow management.   Expert systems use them to control reasoning and logic.   The semantic web uses them for implementing constraints on meaning.  A good example of the problems that can occur when trying to unify these concerns is to study the issues surrounding 'negation-by-failure' (aka. 'weak negation' or 'the closed-world assumption').   This is currently receiving a lot of attention.   Most Rete engines implement a form of weak negation, but cannot cope well with strong negation.   The semantic web needs to handle strong negation.   What is this all about?   Well, a now 'classic' example is to consider a database of US presidents that doesn't contain a record for Ronald Reagan.  What does that mean?   To a production engine, it is typically treated as meaning that Ronald Reagan was not a US president!  This is 'weak negation' where we assume the database is a closed world containing all relevant facts.   You cannot really afford to make that assumption in the semantic web.

 

A specific portability problem that affects Rete engines is that the outcome of rule execution can be significantly affected by adoption of different conflict resolution strategies.   Even 'depth-first', which many engines offer as a default strategy, can be implemented in a variety of different ways, leading potentially to different results.   Hence, a rule set that is executed in one fashion on one engine can be executed rather differently on another, leading in some cases to different outcomes!   This issue alone makes portability very difficult to achieve at the technical level.   One paper submitted to a W3C workshop on rules last year is entitled "Can we exchange rules without exchanging rule engines?"   The paper concludes that the answer is "no"!  http://www.w3.org/2004/12/rules-ws/paper/65/.

 

Any universities teaching rules based approaches as part of their computer science program. If not, what can us folks in the enterprise do to change this?

 

It's a long time since I as in academia, but the answer is certainly yes, though I couldn’t say how widespread this is.   Rules processing has strong roots in AI research, and you can find a good deal of academic literature on the web including course material, tutorials, exercises, etc.

Posted on Sunday, April 30, 2006 4:54 PM | Back to top


Comments on this post: Rules: A response to James McGovern's questions

# re: Rules: A response to James McGovern's questions
Requesting Gravatar...
Greetings:

This blog chain seems to be over a year or so old, but, just in case anyone is still reading it:

Rete 2 was developed for CLIPS/R2, a proprietary algorithm for an OPS / CLIPS rulebase owned by Production Systems Technology, Dr. Forgy's company.

Rete 2 was licensed for Rules Power that was subsequently bought FICO so that they could get Rete 2 into Blaze Advisor. While that was being done, Dr. Forgy wrote some simple extensions so that it would interface with the FICO data analytics tools and renamed Rete 3 - fairly valid but Rete 2 and Rete 3 have the same performance even though Rete 3 with its extensions is just a tad slower on some benchmarks.

Rete NT is the final version and it is covered in InfoWorld. Also, the only company currently using Rete NT is Sparkling Logic in the SMARTS product line. Sparkling Logid is hs heaed by some of the guys from FICO, mainly Carole-Ann Matignon (former VP of Product management at FICO), Carlos Seranno-Morales (inventor of Advisor), Johan Majoor (former Chief Engineer for FICO), Colleen McClintock (Product Manager at ILOG) as well as Dr. Forgy himself. They also have integrated the rules, decision tables, decision trees and analytic tools in a really easy-to-use format.

jco
Left by jco on May 08, 2014 9:57 PM

Your comment:
 (will show your gravatar)


Copyright © Charles Young | Powered by: GeeksWithBlogs.net