04-03-2013, 11:57 AM
Support Vector Machines for Classification in Nonstandard Situations
Support Vector Machines.pdf (Size: 142.31 KB / Downloads: 52)
Abstract
The majority of classification algorithms are developed for the standard situation
in which it is assumed that the examples in the training set come from the same
distribution as that of the target population, and that the cost of misclassification into
different classes are the same. However, these assumptions are often violated in real
world settings. For some classification methods, this can often be taken care of simply
with a change of threshold; for others, additional effort is required. In this paper, we
explain why the standard support vector machine is not suitable for the nonstandard
situation, and introduce a simple procedure for adapting the support vector machine
methodology to the nonstandard situation. Theoretical justification for the procedure
is provided. Simulation study illustrates that the modified support vector machine
significantly improves upon the standard support vector machine in the nonstandard
situation. The computational load of the proposed procedure is the same as that of
the standard support vector machine. The procedure reduces to the standard support
vector machine in the standard situation.
Introduction
In supervised learning, we are given a training data set of n examples, and for each example i,
i = 1, 2, ..., n, in the training set, we observe an input vector xi ∈ Rd, and a label yi indicating
one of several given classes to which the example belongs. We wish to learn a classification
rule from the training set, so that we can assign a class label to any new subjects we encounter
in the future. In this paper we confine ourselves to the binary classification problem: there
are only two classes. This is a special but most common classification problem. Without
loss of generality, we assume the two class labels are −1 and 1, and the associated classes
are called negative class and positive class, respectively.
The standard support vector machines
The support vector machine methodology has its root in Vapnik (1979), and is receiving increasing
attention in recent years. For a tutorial on support vector machines for classification,
see Burges (1998). Here we give a brief summary of the standard support vector machines
for classification, starting from the simple linear support vector machines and moving on to
the nonlinear support vector machines.
Simulation
We use a simple simulation to illustrate the effectiveness of the modified support vector
machine in nonstandard situation. Consider a population consisting of two subpopulations.
The positive subpopulation follows a bivariate normal distribution with mean (0, 0)′ and
covariance matrix diag(1, 1), whereas the negative subpopulation follows a bivariate normal
with mean (2, 2)′ with covariance diag(2, 1). The population is unbalanced: The positive
and negative subpopulations account for 10% and 90% of the total population, respectively.
Assume the cost of false negative is twice the cost of false positive. Notice in this simulation
the Bayes rule that minimizes the expected cost in the target population is sign[16 − 8x2 −
(x1 + 2)2 + 2 log 8
81 ].