28-01-2013, 02:27 PM
Feature Selection With Harmony Search
1Feature Selection.pdf (Size: 730.96 KB / Downloads: 38)
Abstract
Many search strategies have been exploited for the
task of feature selection (FS), in an effort to identify more compact
and better quality subsets. Such work typically involves the use
of greedy hill climbing (HC), or nature-inspired heuristics, in
order to discover the optimal solution without going through
exhaustive search. In this paper, a novel FS approach based on
harmony search (HS) is presented. It is a general approach that
can be used in conjunction with many subset evaluation techniques.
The simplicity of HS is exploited to reduce the overall
complexity of the search process. The proposed approach is able
to escape from local solutions and identify multiple solutions
owing to the stochastic nature of HS. Additional parameter control
schemes are introduced to reduce the effort and impact of
parameter configuration.
INTRODUCTION
THE MAIN aim of feature selection (FS) is to discover
a minimal feature subset from a problem domain while
retaining a suitably high accuracy in representing the original
data [9]. Practical problems that arise when analyzing data
in real-world applications are often related to the number of
features (so-called “curse of dimensionality” [1]), and the inability
to identify and extract patterns or rules easily due to high
interdependence among individual features, or the behavior of
combined features. Human evaluation and subsequent pattern
identification are limited when considering data sets which have
very large numbers of features [46], [57]. Techniques such as
text processing and classification [30] can benefit greatly from
FS once the noisy, irrelevant, redundant, or misleading features
are removed.
PRINCIPLES OF HS
HS mimics the improvisation process of musicians during
which each musician plays a note for finding a best harmony all
together.When applied to optimization problems, the musicians
typically represent the decision variables of the cost function,
and HS acts as a meta-heuristic algorithm which attempts to
find a solution vector that optimizes this function. In such
a search process, each decision variable (musician) generates
a value (note) for finding a global optimum (best harmony).
The HS algorithm, therefore, has a novel stochastic derivative
(for discrete variable) based on musician’s experience,
rather than gradient (for continuous variable) in differential
calculus.
Key Concepts
The key concepts of HS algorithm are musicians, notes, harmonies,
and harmony memory. In most optimization problems
solvable by HS, the musicians are the decision variables of the
function being optimized. The notes played by the musicians
are the values each decision variable can take. The harmony
contains the notes played by all musicians, namely a solution
vector containing one value per variable. Harmony memory
contains harmonies played by the musicians, or it can be viewed
as the storage place for solution vectors.
A more concrete representation of harmony memory is a
2-D matrix, where the rows contain harmonies (solution vectors)
and the number of rows is predefined and bounded by
the size of harmony memory. Each column is dedicated to one
musician; it not only stores the good notes previously played
by the musician but also provides the pool of playable notes
for future improvisations. In this paper, such a column will be
referred to as the note domain for the musician.
PARAMETER CONTROL IN HS
Traditional HS uses fixed predefined parameters throughout
the entire search process, making it hard to determine a
“good” setting without extensive trial runs. The parameters
are also nonindependent from each other; therefore, finding
a good setting often becomes an optimization problem itself.
The search results usually provide no hint on how parameters
should be adjusted in order to gain a performance increase. To
eliminate the drawbacks lying with the use of fixed parameter
values, a dynamic parameter adjustment scheme is proposed to
modify parameter values at run time. By using tailored sets
of parameter values for the initialization, intermediate, and
termination stages, the search process can benefit greatly from
this dynamic parameter environment.
HS Applied to FRFS
FRFS [25] is concerned with the reduction of information or
decision systems through the use of fuzzy-rough sets. A fuzzyrough
set [15] is defined by two fuzzy sets, fuzzy lower and upper
approximations, obtained by extending the corresponding
crisp rough set notions [47]. In the crisp case [43], elements either
belong to the lower approximation (i.e., have a membership
of 1) or not. Given a data set with discretized attribute values,
a subset (termed reduct) of the original attributes can be found,
which are the most informative, by manipulating the rough sets
defined over a data set; all other attributes can be removed
from the data set with minimal information loss. In the fuzzyrough
case, elements may have a membership in the range
[0, 1], allowing greater flexibility in handling uncertainty.
EXPERIMENTATION AND DISCUSSION
In this section, the results of a number of experimentations
carried out are reported to demonstrate the capabilities of
the proposed approach: parameter-controlled HS with iterative
refinement (HS-IR). A cross comparison against other search
approaches is given in Section VI-A, including GAs [56], PSO
[53], and HC, where both GAs and PSO are evolutionary global
optimization heuristics. In order to demonstrate the generalization
ability of HS-IR, three different subset evaluators are
used in the experiments: probabilistic-consistency-based FS
[8], correlation-based feature subset selection (CFS) [19], and
FRFS [25] which was previously used as the example application
in Section IV-B.
Comparison Between Parameter Control Rules
The parameter control rules discussed in Section III are
examined here, with results compared against other possible
approaches. The previously employed data sets are once again
adopted, with either CFS or FRFS acting as the subset evaluator.
For the less complex data sets, all algorithms lead to identical
results, and therefore, those results are omitted. The following
discussions focus on the remainder of the results. Here, the arrows
illustrate how the parameters are adjusted over iterations,
e.g., HMS means that HMS decreases from maximum to
minimum value as the search progresses.
CONCLUSION
This paper has described a flexible FS method based on
parameter-controlled HS. An iterative refinement technique
is also presented aiming to find the quality feature subset(s)
with compact size. This work offers a number of advantages
over conventional approaches: fast convergence, simplicity,
insensitivity to initial states, and efficiency in finding minimal
subsets.
Experimental comparative studies show that the HS is capable
of identifying good-quality feature subsets for most test
data sets. The resulting classification accuracy tested upon discovered
fuzzy-rough reducts is comparable to that using the full
feature subsets. In almost all aspects, the HS approach delivered
considerably better results than GAs, and it is also superior
than PSO in many cases. The proposed improvements, namely
parameter control and iterative refinement, further improve the
HS performance, making it a strong search mechanism for data
sets with large amount of features.