Geeks With Blogs

Charles Young

First up today was Gary Riley talking about how he introduced much greater performance into CLIPS 6.3.   The talk was in more detail than last year’s. Gary talked about speed was not the chief priority for earlier versions of CLIPS, and how, when he first designed the engine, he was uncertain of how beneficial certain features such as hash tables would be.    Over the years, though, CLIPS had started to fall significantly behind the performance of Java engines.   The improvements made in 6.3 have broadly allowed it to leapfrog most Java engines.   Gary discussed his belief that Java code is unlikely to offer the same performance possibilities as C code, but also argues that you can’t directly compare a C code base with OO Java code.   Now, I am not sure I totally agree with his position on this, but that is OK.   Gary made it clear that the speed improvements have come about mainly through adoption of prior art rather than some fundamental change in the algorithm.

The biggest improvement came from hashing the alpha memories.   He spent some time fine-tuning the hash table size against a number of benchmarks (Waltz and manners). Gary also hashed the beta memories.   He heard the suggestion that hashing the beta memories does not yield much performance improvement, but felt that there was a case for implementing this approach. Gary then went on to discuss salience groups on the agenda.   This feature reduces the amount of agenda traversals.   He then described the implementation of ‘Not’ nodes that uses a lazy evaluation approach using a kind of bookmarking to reduce the amount of work that needs to be done in determining existential quantification.   Gary described his approach to ‘asymmetric’ retraction that has been in CLIPS for some time.   In classic Rete, retraction is symmetrical to assertion, but he has implemented an approach in which he uses additional links to retract partial matches from the bottom of the network back up to the point of change.   Gary went on to explain a number of additional areas where he tidied up and improved CLIPS, including Exists, joins from the right for handling Nots with embedded Ands, etc.   He finished off with a discussion of benchmarks and benchmark results.

And so on to the next speaker of the day who was...well me actually!   My talk was entitled “A Survey of Complex-Event Processing Models” and was the third of the CEP talks given during the conference.   I was able to gloss over the first few slides thanks to the previous two presentations from Paul and Edson, although I had a couple of points to make. In CEP, it is the events that are complex, not necessarily the processing (though it may be).   One important aspect of CEP is the notion of event abstraction hierarchies and the way they can be used to present views of complex events to different personas.   I also introduced the broad notion of a complex event-processing agent and the way it detects and reifies complex events, and to describe the role of agents within event processing networks and wider event driven architectures.

And so on to the heart of the talk.   The CEP market today is largely defined by event stream processing technologies.   These mainly use forms of dataflow. There is a lot of variation in implementation, but a useful perspective is to think in terms of stream-orientated processing (immediate forms of event-by-event processing with minimal event retention) and set-based processing of materialised views of event data. A typical engine may use aspects of both approaches, often with an emphasis towards one or the other. I described stream-orientated approaches in a bit more detail, together with the need to implement select & consume strategies based on contextual configurations.   I then compared this with Rete.   Rete uses a very specific type of dataflow involving two inner networks.   It combines stream-orientated and set based processing side-by-side.   It is clear that a Rete network can reasonably be used for complex event detection.  However, there are issues.   The ‘holistic’ nature of a rete network (constructed over many rules) supports strong forms of redundancy elimination, but may be harder to maintain in a continuous query scenario.   The degree of logical synchronisation required in a rete network is a barrier to scalability through parallelisation.   Event processing technologies tend to offer lees redundancy elimination, but keep individual queries more separate, making them easier to maintain and easier to handle in a parallelised fashion.   Event processing engines are the better bet for ultra-low latency, high-throughput time-critical event processing.   Other issues concern having to add additional semantic and temporal processing support to traditional Rete implementations.

The last part of the talk looked at Rete engines an CEP agents, and explored various ways in which such agents might be combined with event stream processing technologies in an EPN and models for building hybrid agents that combine aspects of Rete and event stream processing. Some engines have begun to implement aspects of certain models.   There are further avenues to explore.   Rete is a better bet for downstream event processing, further away from event sources, and offers industrial-strength inferencing, a natural technology for bridging between the EPN and other parts of the event-driven architecture and offers great features for computing event abstraction hierarchies.

Next up was Mark Proctor who claimed that Fair Isaac had tried to poison him (with alcohol I assume) the night before :-) His talk was on distribute agent-based computing.   He talked about Gregor Hohpe, responsible for EI patterns, and the work he is doing on stateless conversation patterns.   An agent is capable of acting in an environment, communicate with other agents, may be able to reproduce itself and whose behaviour tends towards satisfying its objectives.   In multi-Agent systems are considered to be autonomous.   Mark contrasted this to distributed systems which are designed as a set of independent subsystems which collaborate as a single system.   I think this distinction is problematic, but there is certainly a distinction to be made between agent-based processing and other forms of distributed processing.   A mobile agent can migrate from machine to machine across a heterogeneous network.   Agents don’t’ have to be mobile.   They may be stationary. Mobility can be used to create dynamically self-configuring topologies that reduce network bandwidth usage, latency, etc.   Mobile agents can provide a robust, fault-tolerant environment.

Mobility may be based on code mobility, code-on-demand and mobility of agents.   Instead of moving code, we may choose to move state between (more) static agents.    Agents need not only to be able to communicate with each other.   They need to interact with each other cooperatively.   Agent-to-agent communication can be enabled through shared memory (e.g. via a ‘blackboard’) and direct message passing.   Mark talked about BDI model and agent communication models and languages (KQML, FIPA-ACL) and various ontological standards.   He introduced the idea of ‘speech act’ theory involving locution, illocution, perlocution, etc.   There is some debate about the application of Speech Act theory to computing.   Mark talked in more details about various standards and specifications, and showed us examples of the conversational approach.   He talked about coordination using different models (e.g., direct supervision, standardisation, mutual adjustment, distributed search goals).   He also introduced the concept of contract nets involving recognition, bidding, awarding and expediting.  

And so on to the final presentation by Dr. Charles Forgy.   Charles developed the Rete algorithm back in the 1970’s and must surely be the best know name in this corner of IT.   He has continued to work on rules engine implementation over the last three decades and is responsible for the fastest implementation of inferencing engines currently available.   His talk was on “Making Parallelism Available to Rule Developers”. Processor speeds are not increasing as fast as they were.   The free ride has been over since about 2003/2004.    Attention is now on multi-core computing.   Rule-based systems will need to adapt to this new world.   A canonical forward-chaining rule-based Knowledge Source uses a recognise-act cycle.   The amount of parallelisation available to these systems is limited.   He described Amdahl’s Law. In early days of Rete, conflict resolution took perhaps 90% of the total execution system.   This is no longer true.   In OPSJ, Dr Forgy increased the rate of change of working memory and to make parallelism visible to the rule developer in a way that avoids error-prone forms or parallelisation.

One approach to improve parallelisation is to allow the recognise-act cycle to perform more changes on each cycle.   A second possibility is to allow multiple recognise-act cycles to act in parallel.   From the beginning, rules have been tuple-orientated.   Set-orientated rules allow conditions to match sets of objects.    There is no reason, however, why rule languages need to be tuple-orientated.    Indeed, most modern rule languages support set-orientated approaches allowing more work to be done in a single cycle.   Dr. Forgy talked about immediate rules. These rules cause multiple rules to fire concurrently. He described ways to hint to an engine when it is safe to use parallelisation.   He talked about the CONS keyword in OPSJ that is used to control fact consumption.   CONS provides fine-grained refraction control.   OPSJ also keeps a count of the number of rules that might match an object.   When the count goes to zero, the fact is retracted.   OPSJ can run multiple knowledge sources, but historically in a limited fashion.   This has been improved with the sendPacket API for use in parallelised situations.   The engine has been changed to allow knowledge sources to pause when the conflict set is empty.   The arrival of a packet restarts processing.   OPSJ has no pre-defined definition of packages.   This is in the gift of the developer.   OPSJ adds a Probe construct which is a special type of rule that sends messages to its owning knowledge source.   Dr. Forgy went on to describe Blackboard-style levels to working memory.   Levels can be specified for insert operations, sendPacket operations and conditions.  

Dr. Forgy then went on to describe how multiple threads might be used in a single Knowledge Source.   This is very problematic.   Is there room for improvement?   The issue, as always, is shared mutable data. He talked about the use of private working memory levels and other ideas.   He then moved on to a discussion of parallelism over networks.   There are possibilities regarding shared distributed memory.   He ended up talking about languages.   Java is not as fast as C/C++.   Modern processors are very complex, out of order engines that might have 100 in flight instructions.   Java, according to Dr. Forgy, does not take advantage of this properly and as a result, IPCs are low.   Sun’s Hotspot JiT compiler, for example, produces machine code that has too many sequential dependencies and fails to take advantage of modern architectures.   There is room for improvement.

A great conference.   A million thanks to James, Greg, Chelanie, etc., for all their hard work in making this happen.

Posted on Friday, October 30, 2009 11:58 AM | Back to top

Comments on this post: October Rules Fest: Day 5

# re: October Rules Fest: Day 5
Requesting Gravatar...
thanks again for the summary.
Left by Peter Lin on Oct 30, 2009 1:12 PM

# re: October Rules Fest: Day 5
Requesting Gravatar...
Charles, the point I was trying to convey is that one shouldn't use a valid comparison to support an unrelated specious conclusion. In regards to Jess performance, Jess in Action contains the following statements:

"Some people will tell you that Java is slow. They’re wrong. Modern Java virtual machines are extremely powerful and sophisticated. In many applications, Java is as fast as comparable C or C++ code. For Jess, being written in Java is not a liability."

"Independent benchmarks have shown that Jess is significantly faster than many rule engines written in the “faster” C language."

The valid comparison is that Jess ran the manners/waltz benchmarks many times faster that CLIPS 6.2. The implied specious conclusion is that Java is as fast as comparable C code. If Jess implemented an optimization that CLIPS 6.2 did not (in essence making the algorithms different in a significant way), can you accurately call the code comparable for the purpose of comparing the speed of the underlying implementation language? In my opinion you can't and therefore the correct conclusion that one should draw from this comparison is that rete implementations that don't support some type of alpha/beta memory hashing perform poorly on some benchmarks.

Now that the tables are reversed (, should I be making the following statement?

"Independent benchmarks have shown that CLIPS is significantly faster than most rule engines written in the more "modern" Java language?"

The conclusion that Java isn't really all that modern because it runs a benchmark slower than comparable code in C is specious. My argument is that it's the use of modern programming constructs in Java for which there is no analog in C and the use of lower level primitives in C for which there is no analog in Java that account for the difference in speed. So assuming that the rete implementations are essentially the same, you can draw some valid conclusions from these benchmarks about the speed of a C code base as compared to an OO Java code base, but not much else.

Of course, the real nail in the coffin for this issue is Dr. Forgy saying that Java is not as fast as C/C++.
Left by Gary Riley on Nov 05, 2009 1:45 PM

# re: October Rules Fest: Day 5
Requesting Gravatar...
Thanks for the clarification, Gary. I agree with you about the 'implied specious conclusion'. You have to be very careful when interpreting the results you get from running a test like Manners or WaltzDB. When I last did a comparison between CLIPS 6.2 and Jess 6, I noticed that CLIPS appeared to outperform Jess for very small numbers of guests in Manners. Given the highly contrived nature of Manners, I speculated that this may be a better indication of comparative performance for the 80% of rule sets which are small and simple, rather than the 20% which are big and complex. So, I never personally accepted that CLIPS 6.2 was slower than Jess 6 - just that it was slower when performing Manners over a certain guest size.

I remain a bit of a sceptic when general comments are made about performance of native code emitted by different compilers, for much the same reasons that I did not accept that CLIPS 6.2 is slower than Jess. The very, very bottom line is that native code is just native code, whether it is emitted by a JiT compiler or a static compiler. Each approach has a different range of available optimisations, of course. You were, I believe, talking about comparison of codebases written in different languages. Both you and Dr. Forgy were talking about Java. Dr. Forgy talked specifically about Sun's Hotspot technology in relation to modern processor architectures. I'm a .NET developer these days, and I can't really comment with any authority on Java compilers or runtimes, though I always resisted believing claims that one is inherently faster than the other. The better benchmarking I have seen for these two similar technologies paints a very mixed picture.

C developers work closer to the metal than Java or .NET developers. Also, Java and .NET provide huge class libraries and lots of general-purpose constructs which are very practical, but which won't necessary be super-optimal. Why roll your own hashtable when you already have one provided for you? Except that a custom hashtable offers more opportunity for optimisation, of course. Then, of course, there is the vexed question of automatic garbage collection. Surely malloc/free is faster? But maybe the more important issue is heap fragmentation which tends to be lower in GC environments (a massive generalisation, I agree). Also, modern GC environments minimise collections very effectively, so the GC overhead is sometimes virtually non-existent. And that's the point really. My C# compiler has umpteen different ways of compiling a humble If statement. It tries very hard to optimise the code depending on the number and nature of the conditions, etc. Sometimes it uses hashing techniques, sometimes it emits a switch, sometimes it uses direct jumps and often it creates a crazy mixture. And that's just at the intermediate level. Goodness knows what different JiT compilers do. Does your C compiler support a similar range of optimisations? Maybe for all I know, but I suspect not. I suspect the C compiler leaves it up to you to work out how to optimise your code. Is that better or worse? I don't think it is a matter of not being able to compare codebases. I think it is a matter of not being able to easily compare a whole range of compiler techniques, optimisations at different levels, memory management issues and the like, and the skills different developers have in squeezing the very best from the technologies they happen to be using. I'm not convinced that the case can be proved one way or another in relation to C vs. VMs.
Left by Charles Young on Nov 05, 2009 6:19 PM

# re: October Rules Fest: Day 5
Requesting Gravatar...
Here's my useless 2 bits on the issue. Assuming the implementation between two applications is the same and "equally" well written, I'd bet good money C/C++ will be faster. For me, the issue is how much experience and skill is needed to use a programming language. I suck horribly at C/C++ and it takes a lot more skills to write a great application in C/C++ than Java/C#. If I had to write Jamocha in C/C++, honestly it would have never happened. I just don't have that skill and experience. Acquiring that level of experience takes a long time. Only guys like Dr. Forgy, Gary, Ernest and Paul Haley have the chops to do it. Everyone else simply doesn't have what it takes. That's not false modesty either, it's the cold hard truth.

If I remember correctly, ART wrote their own VM to insure memory management isn't an issue. I've read old ART docs and it provides a decent explanation of their design. Paul Haley would be able to shed some light on ART's architecture.
Left by Peter Lin on Nov 06, 2009 9:57 AM

Your comment:
 (will show your gravatar)

Copyright © Charles Young | Powered by: