|
CANOPY: A Boolean Optimization Approach for Minimizing Ensemble Size

Hongfei Wang |

Shawn Blanton |
Machine learning is employed when the problem of concern is not easily modeled using first principles. Typical applications include speech recognition, internet search, computer vision, and medical diagnosis. Increasingly, learned classifiers are being used in various tasks related to integrated circuit design, manufacturing and test. In previous research here at Carnegie Mellon, ensembles of learned classifiers have been used to reduce the cost of testing micro-electromechanical accelerometers, SERDES, phase-locked loops, and RF transceivers.
Ensembles of decision trees, commonly known as decision forests, are among the most frequently used approaches for machine learning, and majority voting is typically employed to gain final classification from the forest. In this work, a new meta-learning and ensemble pruning method called CANOPY is described. CANOPY uses techniques from logic synthesis for digital circuits to identify particular base-level classifiers (decision trees) for use in the classification function, and to form the classification function itself. Overall classification decisions are made using a nonlinear Boolean expression of the decisions that stem from selected classifiers. This method is evaluated on a set of public benchmark data sets and one obtained from industry. The results demonstrate that CANOPY saves on average about 90% (over 98% in the best case) of storage space as compared to the full decision forest. Moreover, a pairwise t-test for a 95% significance level with 99 degrees of freedom between majority voting and CANOPY reveals that the differences in their prediction accuracy can be ignored for most of the data sets considered. Therefore, CANOPY can be used in a variety of emerging applications where there are stringent power and storage requirements, while at the same time maintaining classification accuracy comparable to much larger ensembles that use majority voting
Fig. 1 illustrates the process implemented by CANOPY. Decision trees in the forest are selected by synthesis (shown in dark green) as forest representatives. Ensemble prediction ensemble is obtained by evaluating the resulting Boolean expression (BE) using the tree predictions.
 |
Fig 1. Digital-circuit synthesis utilized in CANOPY for meta-learning. |
|