Support Vector Machines for Classification in Nonstandard Situations

**study tips** · 04-03-2013, 11:57 AM

Support Vector Machines for Classification in Nonstandard Situations

.pdf

Support Vector Machines.pdf (Size: 142.31 KB / Downloads: 52)

Abstract

The majority of classification algorithms are developed for the standard situation
in which it is assumed that the examples in the training set come from the same
distribution as that of the target population, and that the cost of misclassification into
different classes are the same. However, these assumptions are often violated in real
world settings. For some classification methods, this can often be taken care of simply
with a change of threshold; for others, additional effort is required. In this paper, we
explain why the standard support vector machine is not suitable for the nonstandard
situation, and introduce a simple procedure for adapting the support vector machine
methodology to the nonstandard situation. Theoretical justification for the procedure
is provided. Simulation study illustrates that the modified support vector machine
significantly improves upon the standard support vector machine in the nonstandard
situation. The computational load of the proposed procedure is the same as that of
the standard support vector machine. The procedure reduces to the standard support
vector machine in the standard situation.

Introduction

In supervised learning, we are given a training data set of n examples, and for each example i,
i = 1, 2, ..., n, in the training set, we observe an input vector xi ∈ Rd, and a label yi indicating
one of several given classes to which the example belongs. We wish to learn a classification
rule from the training set, so that we can assign a class label to any new subjects we encounter
in the future. In this paper we confine ourselves to the binary classification problem: there
are only two classes. This is a special but most common classification problem. Without
loss of generality, we assume the two class labels are −1 and 1, and the associated classes
are called negative class and positive class, respectively.

The standard support vector machines

The support vector machine methodology has its root in Vapnik (1979), and is receiving increasing
attention in recent years. For a tutorial on support vector machines for classification,
see Burges (1998). Here we give a brief summary of the standard support vector machines
for classification, starting from the simple linear support vector machines and moving on to
the nonlinear support vector machines.

Simulation

We use a simple simulation to illustrate the effectiveness of the modified support vector
machine in nonstandard situation. Consider a population consisting of two subpopulations.
The positive subpopulation follows a bivariate normal distribution with mean (0, 0)′ and
covariance matrix diag(1, 1), whereas the negative subpopulation follows a bivariate normal
with mean (2, 2)′ with covariance diag(2, 1). The population is unbalanced: The positive
and negative subpopulations account for 10% and 90% of the total population, respectively.
Assume the cost of false negative is twice the cost of false positive. Notice in this simulation
the Bayes rule that minimizes the expected cost in the target population is sign[16 − 8x2 −
(x1 + 2)2 + 2 log 8
81 ].

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	ppt on Vector	study tips	0	616	22-02-2013, 10:09 AM Last Post: study tips
	Vector analysis and Maxwell equations	seminar tips	0	834	16-02-2013, 04:12 PM Last Post: seminar tips
	More on regression. Gradient descent. Classification	project girl	0	936	15-12-2012, 02:56 PM Last Post: project girl
	Random k-Labelsets: An Ensemble Method for Multilabel Classification	seminar flower	0	1,798	04-07-2012, 12:51 PM Last Post: seminar flower

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.