17-09-2012, 04:50 PM
MultiBoosting: A Technique for Combining Boosting andWagging
1MultiBoosting.pdf (Size: 125.56 KB / Downloads: 52)
Abstract.
MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision
committees. MultiBoosting can be viewed as combining AdaBoost with wagging. It is able to harness both
AdaBoost’s high bias and variance reduction with wagging’s superior variance reduction. Using C4.5 as the base
learning algorithm, MultiBoosting is demonstrated to produce decision committees with lower error than either
AdaBoost or wagging significantly more often than the reverse over a large representative cross-section of UCI
data sets. It offers the further advantage over AdaBoost of suiting parallel execution.
Introduction
Decision committee learning has demonstrated spectacular success in reducing classification
error from learned classifiers. These techniques develop a classifier in the form of a
committee of subsidiary classifiers. The committee members are applied to a classification
task and their individual outputs combined to create a single classification from the
committee as a whole. This combination of outputs is often performed by majority vote.
Examples of these techniques include classification ensembles formed by stochastic search
(Ali, Brunk, & Pazzani, 1994), bagging (Breiman, 1996a), AdaBoost (Freund & Schapire,
1995), Nock and Gascuel’s (1995) decision committees, averaged decision trees (Oliver &
Hand, 1995), and stacked generalization (Wolpert, 1992).
AdaBoost and bagging
Before describing the new MultiBoost algorithm, it is desirable to outline its antecedent
algorithms, AdaBoost, bagging, and wagging. All of these algorithms use a base learning
algorithm that forms a single classifier from a training set of examples. This base classifier is
provided with a sequence of training sets that the committee learning algorithm synthesizes
from the original training set. The resulting classifiers become the constituent members of
the decision committee.
Bias and variance
A number of recent investigations of decision committees have analyzed error performance
in terms of bias and variance. The decomposition of a learner’s error into bias and variance
terms originates from analyses of learning models with numeric outputs (Geman,
Bienenstock, & Doursat, 1992). The squared bias is a measure of the contribution to error
of the central tendency or most frequent classification of the learner when trained on
different training data. The variance is a measure of the contribution to error of deviations
from the central tendency. Bias and variance are evaluated with respect to a distribution of
training sets T , such as a distribution containing all possible training sets of a specified size
for a specified domain.