29-03-2014, 03:06 PM
mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification
A maximum relevance minimum.pdf (Size: 533.36 KB / Downloads: 26)
ABSTRACT
This paper presents a hybrid filter–wrapper feature subset selection algorithm based on
particle swarm optimization (PSO) for support vector machine (SVM) classification. The fil-
ter model is based on the mutual information and is a composite measure of feature rele-
vance and redundancy with respect to the feature subset selected. The wrapper model is a
modified discrete PSO algorithm. This hybrid algorithm, called maximum relevance mini-
mum redundancy PSO (mr2PSO), is novel in the sense that it uses the mutual information
available from the filter model to weigh the bit selection probabilities in the discrete PSO.
Hence, mr2PSO uniquely brings together the efficiency of filters and the greater accuracy of
wrappers. The proposed algorithm is tested over several well-known benchmarking data-
sets. The performance of the proposed algorithm is also compared with a recent hybrid
filter–wrapper algorithm based on a genetic algorithm and a wrapper algorithm based
on PSO. The results show that the mr2PSO algorithm is competitive in terms of both clas-
sification accuracy and computational performance.
Introduction
Many practical applications of classification involve a large volume of data and/or a large number of features/attributes.
Since these datasets are usually collected for reasons other than mining the data (e.g. classification), there may be some
redundant or irrelevant features [13]. This is especially important when a large number of features exist and there are com-
parably few training sample data points, making feature vector dimensionality reduction an imperative. Examples of this
include gene selection from microarray data to separate healthy patients from cancer patients, text categorization to perform
automatic sorting of URLs into a web directory, and detection of unsolicited spam email [14]. Extraction of valuable infor-
mation from these datasets requires exhaustive search over the sample space. This brings about such challenges as managing
computational time complexity while extracting compact yet effective models. A common approach for overcoming these
challenges is to employ dimensionality reduction (e.g., feature subset selection) techniques.
Overview of feature selection methods
There are a number of studies that provide an overview of feature selection methods as well as guidance on different as-
pects of this problem [9,21,29,35,43,44]. Most feature subset selection algorithms can be categorized into two types: filter
and wrapper algorithms. The main distinction between the two is that filter algorithms select the feature subset before the
application of any classification algorithm. By using statistical properties of features, the filter approach eliminates the less
important features from the subset [8]. Besides their comparative computational efficiency, there are other compelling argu-
ments for using filter methods. For instance, some filter methods provide a generic selection of variables which are indepen-
dent of a given learning machine. However, given a classifier, the best feature subset is usually available through the wrapper
methods [14].
mr2PSO-hybrid PSO algorithm for feature selection
This section describes our proposed filter–wrapper framework based on the maximum relevance minimum redundancy
filter and particle swarm optimization search heuristic. In general, a feature subset selection problem can be described as
follows; Let K be a dataset of N records with D dimensions (features) which is a K 1⁄4 N Â D matrix. The goal of the feature
subset selection is to obtain d features from the whole feature space where d < D, which optimizes a criterion function. Seb-
ban and Nock [45] categorize the feature selection algorithms into three classes according to what is optimized:
1. algorithms that find the feature subset of a specified dimensionality in which the classes of data are most discriminable,
2. algorithms that find the smallest feature dimensionality for which the discriminability exceeds a specified value, and
3. algorithms that find a compromise between a small subset of features and the class discriminability.
In this study, we consider only the first and third algorithm classes. We further consider various discriminability mea-
sures such as the classification accuracy, Kappa statistic (which measures the agreement between two raters who each clas-
sify N items into C mutually exclusive categories [66]), and mutual information. In the remainder of this section, we first
describe the support vector machine classifier used in our hybrid wrapper–filter algorithm. Second, we describe the filter
component (based on mutual information) of the hybrid framework. Finally, we describe how the filter component is inte-
grated within the wrapper PSO algorithm.
Conclusion
In this study, we proposed a framework that combines the advantages of filter and wrapper type of feature subset selec-
tion algorithms and embedded this framework into the particle swarm optimization heuristic. Different than the earlier hy-
brid filter–wrapper algorithms that use filter and wrapper models in sequence, our approach structurally integrates the filter
model within the PSO based wrapper model. The filter model is based on the mutual information and is expressed as a com-
posite measure of feature relevance and redundancy. This relevance and redundancy composite criterion is used to weigh
the probabilities of features to be included in the feature subset. Hence, it enhances the convergence rate as well as the solu-
tion quality of the feature subset selection problem. We compared the performance of the proposed hybrid algorithm, in
terms of both quality as well as speed, with a recently proposed hybrid filter–wrapper method based on a genetic algorithm
as well as a PSO based wrapper method. The results indicated that the proposed method is superior with respect to both
alternatives. While we adopted identical SVM and RBF kernel parameters (C and r) as in [20] for comparison purposes,
the best parameter settings depend on the given problem. For best results, we recommend a parameter search to identify
good settings so that the classifier’s prediction accuracy is improved [24]. Further, we used 2-fold cross-validation method
as in [20], which is known to be pessimistically biased and increase the variance of the prediction accuracy [25]. Hence, we
recommend stratified 10-fold cross-validation with larger number of samples than the ones used in this study.