Monday, March 17, 2014

Manners and Waltz Benchmarks

The two most widely used benchmarks for rule-based systems are Manners and Waltz. Daniel Selman nicely summarizes the major issues with these benchmarks in The Good, The Bad, and the Ugly - Rule Engine Benchmarks.

CLIPS didn’t implement the optimizations needed to run these benchmarks efficiently until version 6.30, so it compared unfavorably to other engines when these were the only metrics used for comparing performance.

As the following graphs show, the performance of CLIPS 6.30 is orders of magnitude faster than CLIPS 6.24 for the larger data sets used by these benchmarks.

In particular, hashing the memory nodes and optimizations for handling large numbers of activations caused the dramatic improvement in the benchmark results.

How did these optimizations improve the performance of a real world application? I benchmarked some of the larger sample data sets for a production system I developed that’s run hundreds of thousands of time a month. The system consists of hundreds of rules and the amount of data processed can range up to ten thousand or more facts.

Each of the samples showed improvement, but not nearly as dramatic as Manners or Waltz:

A process is created and the rules loaded each time the system is run, so a more accurate picture of the total processing time would include the time to load the rules:

There’s nothing wrong with modest improvements, but if your expectations of performance were based on Manners and Waltz, you’d surely be disappointed.

That’s not to say there weren’t performance benefits of the 6.30 optimizations in real world situations. Occasionally, I’d write one or more rules that were efficient for a small data set, but acceptance testing did not include a large data set. The system would run fine until a sufficiently large data set with the appropriate types of facts was submitted, at which point the process would display non-optimized Manners/Waltz behavior (i.e. it would appear to hang).

When using CLIPS 6.24, I rewrote the offending rules to be more efficient for large data sets. With 6.30, since the system is more tolerant of inefficient rules, these situations occur less frequently and are easier to correct.

CLIPS versions of the Manners and Waltz benchmarks are available here and here.

No comments:

Post a Comment