Thursday, July 31, 2014

Join Activity

I've been an expert system tool developer for the last twenty-nine years, but it's only been the last nine years that I've had a significant amount of ongoing year-to-year experience developing and maintaining programs from the other side as a user of expert system tools.

One of the pain points I encountered was trying to determine which rules were causing performance issues. A typical scenario would be that the rules would pass validation testing by the business users, but then weeks or months later an atypically large data file would be processed uncovering a performance issue (requiring hours for the data to be processed rather than seconds). Isolating the problem was accomplished by strategically deleting rules until the performance issue went away. The offending rule(s) were then inspected and analyzed with respect to the data to determine the cause of the problem. This was hardly an ideal approach and only worked best if there were a few rules causing the issue.

From a user perspective, It would be nice if you could profile rule performance much as you can profile the amount of time spent in functions/methods in languages such as C and Java. In fact, CLIPS does have profiling functions which allow you to track the amount of time spent in the actions of a rule. Unfortunately, the performance issues in rules are usually caused by the way the conditions have been written. Because rule conditions can be shared, it's difficult to assign time spent while pattern matching to a specific rule.

When I was developing performance improvements for CLIPS 6.3, I added code to track the number of operations performed on partial matches in the join network to gauge the benefit of using hash tables rather than linked lists for the join memories. It occurred to me that this could be a useful metric for identifying performance issues, and from this idea came the join-activity command in CLIPS 6.3.

This new command easily identifies the offending rule in the manners benchmark program:
CLIPS> (clear)
CLIPS> (unwatch all)
CLIPS> (load manners.clp)
:%%%%%%%********
TRUE
CLIPS> (reset)
CLIPS> (load-facts manners128.fct)
TRUE
CLIPS> (run)
CLIPS> 
(progn$ 
   (?r (get-defrule-list)) 
   (printout t ?r ": " (join-activity ?r terse) crlf))
assign_first_seat: (1314 1315 1315)
find_seating: (5847660 7067211 7067211)
make_path: (49549 16510 16510)
path_done: (127 254 254)
are_we_done: (128 255 255)
continue: (0 127 127)
print_results: (257 258 128)
all_done: (0 1 0)
CLIPS> 
When used in terse mode, the join-activity command returns a multifield value containing the number of comparisons made between the left and right memories, the number of partial matches added to the join memories, and the number of partial matches removed from the join memories. For this program, the find_seating rule has much more join activity than all the other rules combined.

Using the verbose mode allows you to examine the information for each join in the rule.
CLIPS> (join-activity find_seating verbose)
Activity for CE 1
   Compares:          0
   Adds:            127
   Deletes:         127
Activity for CE 2
   Compares:       8128
   Adds:           8128
   Deletes:        8128
Activity for CE 3
   Compares:      28255
   Adds:          28255
   Deletes:       28255
Activity for CE 4
   Compares:    2487321
   Adds:        1241614
   Deletes:     1241614
Activity for CE 5
   Compares:    2483228
   Adds:        2483228
   Deletes:     2483228
Activity for CE 6
   Compares:     823428
   Adds:        1660994
   Deletes:     1660994
Activity for CE 7
   Compares:      17300
   Adds:        1644865
   Deletes:     1644865
(5847660 7067211 7067211)
CLIPS> 
In conjunction with the sort function, you can create a ranked listing of all the rules. Here's an example using the waltz benchmark.
CLIPS> (clear)
CLIPS> (unwatch all)
CLIPS> (load waltz.clp)
%%%%:!!!!!!********************************
TRUE
CLIPS> (reset)
CLIPS> (load-facts waltz50.fct)
TRUE
CLIPS> (run)
CLIPS> 
(deffunction ja-lt (?r1 ?r2)
   (< (+ (expand$ (join-activity ?r1 terse))) 
      (+ (expand$ (join-activity ?r2 terse)))))
CLIPS>
(progn$ (?r (sort ja-lt (get-defrule-list)))
   (format t "%-35s %10d%n" 
             (str-cat ?r ":")
             (+ (expand$ (join-activity ?r terse)))))
initial_boundary_junction_L:           1165683
initial_boundary_junction_arrow:        511101
make_L:                                 163364
make-3_junction:                        128960
label_arrow-1B:                          59799
label_tee_B:                             55991
label_arrow-2B:                          39455
label_arrow-1A:                          35692
match_edge:                              32072
second_boundary_junction_L:              26984
label_tee_A:                             17095
label_arrow-2A:                          16217
label_L:                                 14926
label_arrow-4A:                          13811
label_arrow-4B:                          13811
label_fork-1:                            12727
reverse_edges:                           11618
plot_boundaries:                          9830
second_boundary_junction_arrow:           9641
done_reversing:                           3878
label_fork-2:                             2213
label_fork-3:                             2213
label_arrow-3A:                           1838
label_arrow-3B:                           1838
label_arrow-5A:                           1838
label_arrow-5B:                           1838
plot_remaining:                           1808
label_fork-4:                              614
done_detecting:                              7
done_plotting:                               5
done_labeling:                               2
begin:                                       0
""
CLIPS>
In this case I've added the compares, adds, and deletes together to create a single join activity value for each rule displayed in the output. Hopefully this new command will provide CLIPS developers with more options for analyzing the performance of their rules.

No comments:

Post a Comment