02-11-2016, 10:26 AM
A fuzzy-rough nearest neighbor classifier combined withconsistency-based subset evaluation and instance selection for automated diagnosis of breast cancer
1463314459-1s2.0S0957417415003267main.pdf (Size: 334.56 KB / Downloads: 4)
abstract
Breast cancer is one of the most common and deadly cancer for women. Early diagnosis and treatment of
breast cancer can enhance the outcome of the patients. The development of classification models with
high accuracy is an essential task in medical informatics. Machine learning algorithms have been widely
employed to build robust and efficient classification models. In this paper, we present a hybrid intelligent
classification model for breast cancer diagnosis. The proposed classification model consists of three
phases: instance selection, feature selection and classification. In instance selection, the fuzzy-rough
instance selection method based on weak gamma evaluator is utilized to remove useless or erroneous
instances. In feature selection, the consistency-based feature selection method is used in conjunction
with a re-ranking algorithm, owing to its efficiency in searching the possible enumerations in the search
space. In the classification phase of the model, the fuzzy-rough nearest neighbor algorithm is utilized.
Since this classifier does not require the optimal value for K neighbors and has richer class confidence
values, this approach is utilized for the classification task. To test the efficacy of the proposed classification
model we used the Wisconsin Breast Cancer Dataset (WBCD). The performance is evaluated using
classification accuracy, sensitivity, specificity, F-measure, area under curve, and Kappa statistics. The
obtained classification accuracy of 99.7151% is a very promising result compared to the existing works
in this area reporting the results for the same data set.
Introduction
The development of an effective computer-aided diagnosis and
classification system for diseases is of great importance for medical
informatics. Breast cancer is one of the most common and deadly
cancer for women. In 2008, the number of new cancer cases is
approximately 1.4 million and the number of cancer deaths is
around 460,000 (Ma & Jemal, 2013). Although breast cancer is still
a major cause for death from cancer among women, the cancer
mortality exhibits a decreasing pattern with the help of early
detection, appropriate therapy and treatment (Hayat, 2008). The
diagnosis and treatment of breast cancer in its earlier possible
phases can enhance the outcome of breast cancer patients (Tot &
Dean, 2004). The early detection and accurate diagnosis of the disease
are the two key factors of improved outcomes for breast cancer.
Besides, women with not metastasized breast cancer can
exhibit long-terms survival (West, Mangiameli, Rampal, & West, 2005). Moreover, common diagnostic techniques, such as mammography
and fine needle aspiration cytology exhibit relatively
low reliability in diagnosing malignancy (Chen, Yang, Liu, & Liu,
2011). All these factors intensify the motivations toward the development
of reliable automated diagnosis systems that can facilitate
the clinical decision making process and the early detection of
breast cancer.
This study aims to build an automatic diagnostic system for
breast cancer based on the fuzzy-rough nearest neighbor classifier.
K-nearest neighbor method is one of the most widely employed
algorithms for classification tasks in data mining and knowledge
discovery. Pattern recognition, text categorization, ranking models,
object recognition and event recognition are a few examples of
application fields for K-nearest neighbor (Bhatia & Vandana,
2010). In K-nearest neighbor method, an object is assigned to the
majority class among its K nearest neighbors based on a majority
of its neighbors. K-nearest neighbor method has several advantages.
First, it is very easy to understand and implement. Besides,
the method performs well when the features are weighted carefully
(Nisbet, Elder, & Miner, 2009). On the other hand, the method suffers from some significant disadvantages. First of all, the classi-
fication performance of the method is highly dependent on the
value of K parameter, which determines the space of the neighborhood.
Besides, the method lacks possibilistic classification ability,
namely it cannot make discrimination between equally close
neighbors and equally far away neighbors (Frigui & Gader, 2009).
Moreover, relative closeness of neighbors can be a problematic
issue in the existence of noise or overlapping classes (Sarkar,
2007). The fuzzy-rough nearest neighbor classifier is based on
the theory of fuzzy-rough sets. Rough set theory is a useful method
to deal with vague and uncertain information, classical rough set
model based on equivalence relation can only deal with complete
and symbolic data sets. Fuzzy-rough sets can deal with numerical
attributes. They can encapsulate concept of vagueness and indiscernibility
owing to fuzzy sets and rough sets, respectively (Dai,
2013). Fuzzy-rough nearest neighbor classifier attempts to
enhance the conventional K-nearest neighbor classifier by exploiting
fuzzy-rough uncertainty. While preserving the advantages of
the conventional K-nearest neighbor method, fuzzy-rough counterpart
does not need to know the optimal value of K parameter;
it has the possibilistic classification ability and has the
worst-case time complexity which is the same as the conventional
method. Moreover, it does not require any a priori information
about the training data, though other parametric and
semi-parametric classifiers involve that information (Sarkar,
2007). Owing to the aforementioned enhanced features,
fuzzy-rough nearest neighbor method can be used as a viable tool
for classification.
In this paper, fuzzy-rough nearest neighbor classifier is combined
with consistency-based subset evaluation and fuzzy-rough
instance selection method. Feature selection plays an important
role for building a classification model with high predictive accuracy.
In the consistency-based subset evaluation, attribute subset
selection is applied based on the consistency-based subset evaluation
metric. In this method, the worth of a subset of attributes is
evaluated by the level of consistency in the class values to identify
useful features and eliminate the irrelevant ones. Re-ranking algorithm,
that is an approach analyzing only a few block of variables
to decrease the number of wrapper evaluations, is utilized in
consistency-based feature selection to search the space of feature
subsets effectively. Moreover, a fuzzy-rough set based instance
selection method is applied in order to significantly reduce the
number of instances, but maintaining high classification accuracy.
This instance selection approach uses the weak gamma evaluator.
The rest of the paper is organized as follows: Section 2 briefly
reviews the existing machine learning related researches on the
diagnosis of breast cancer. Section 3 gives a brief explanation for
the methods used in the classification model, i.e. the
consistency-based subset evaluation, the re-ranking algorithm,
the fuzzy-rough instance selection, the fuzzy-rough nearest neighbor
classifier are explained briefly. Section 4 presents the proposed
classification model in detail. In Section 5, data set, evaluation metrics
and experimental results are given. Finally, Section 6 presents
the discussion and concluding remarks.
2. Related works
There has been a lot of research on the diagnosis of breast cancer
with the WBCD data set in the literature with a relatively high
predictive classification performance. Pena-Reyes and Sipper
(1999) reached 97.80% classification accuracy using an approach
that integrates the fuzzy systems and evolutionary algorithms.
Chou, Lee, Shao, and Chen (2004) obtained 98.25% classification
accuracy with the artificial neural networks and the multivariate
adaptive regression splines. Übeyli (2007) compared the lassification performance of a multilayer perceptron neural network,
a combined neural network, a probabilistic neural network,
a recurrent neural network and a support vector machine and
the highest classification accuracy was achieved by support vector
machine with 99.54% classification accuracy. Polat and Günes
(2007) reported 98.53% classification accuracy with the least
square support vector machine classifier algorithm. Sahan, Polat,
Kodaz, and Günes (2007) achieved a classification accuracy of
99.14% with the hybridization of a fuzzy-artificial immune system
with K-nearest neighbor classifier. Mu and Nandi (2007) evaluated
the benefits of applying support vector machines, radial basis function
networks and self-organizing maps and they reported an accuracy
of 98.6%. Ryu, Chandrasekaran, and Jacob (2007) applied
isotonic separation technique to the breast cancer prediction and
the experimental results indicated that the method can be used
as a viable tool for the problem. Karabatak and Ince (2009) utilized
an association rule for dimension reduction and the neural network
for performing classification. They reported a classification
accuracy of 97.4%. Akay (2009) presented a support vector machine
based diagnosis system combined with F-score based feature
selection and the obtained classification accuracy was 99.51%.
Hassan, Hossain, Begg, Ramamohanarao, and Morsi (2010)
obtained 98.89% classification accuracy with an ensemble of area
under ROC curves based feature selection and a hybrid hidden
Markov model-fuzzy approach. Chen et al. (2011) presented a
rough set based support vector machine classifier and obtained a
classification accuracy of 99.41% for 50-50% of training-test partition.
Marcano-Cedeno, Quintanilla-Dominguez, and Andina
(2011) achieved a classification accuracy of 99.26% with the training
of neural network by an artificial metaplasticity multilayer perceptron
algorithm. Uzer, Inan, and Yılmaz (2013) integrated the
principal component analysis with a sequential forward selection
and sequential backward selection based feature selection method
and reported 98.57% classification accuracy. Inan, Uzer, and Yılmaz
(2013) presented an integrated model of association rule mining
based feature selection, the principal component analysis and a
neural network classifier and reported 98.29% classification accuracy.
Li, Peng, and Liu (2013) applied the quasiformal kernel common
locality discriminant analysis for dimensionality reduction
and reported a classification accuracy of 97.26%. Zheng, Yoon,
and Lam (2014) presented a K-means algorithm and support vector
machine based model and reported a classification accuracy of
97.38%.
Seera and Lim (2014) presented a hybrid intelligent classification
model for medical data. The model consists of the Fuzzy
Min–Max neural network, the classification and regression tree
and the Random Forest algorithm. The Fuzzy Min–Max neural network
is responsible for incremental learning, the classification and
regression tree is responsible for enhancing understandability and
the Random Forest algorithm is utilized to enhance the predictive
performance. The WBCD data set is among the medical datasets
used in experimental evaluations. They reported a classification
accuracy of 98.84% for breast cancer diagnosis.
Chen (2014) presented a hybrid intelligent model for breast
cancer diagnoses that can work in the absence of labeled training
data. Hence, this work studies the feature selection methods in
unsupervised learning models. The model integrates clustering
and feature selection. The study indicates that selecting a subset
of relevant features instead of using all the features in the original
data set can enhance the interpretability of clustering results.
Özsen and Ceylan (2014) examined the performance of artificial
immune system as a data reduction algorithm. In order to evaluate
the data reduction performance of artificial immune system, it is
compared to the fuzzy c-means clustering algorithm. Both data
reduction methods are combined with the artificial neural network
classifier to obtain the classification results. They obtained a classification accuracy of 97.80% for artificial immune system and
artificial neural network combination, whereas a classification
accuracy of 90.04% for fuzzy c-means clustering and artificial neural
network combination.
Bhardwaj and Tiwari (2015) presented a genetically optimized
neural network algorithm for medical diagnosis. This study
extends crossover and mutation operators. In this scheme, all the
individuals left after reproduction are taken for crossover operation
and the individuals that cannot produce offspring with better
fitness values from their fitness are mutated. The classification
model obtains an average classification accuracy of 99.26% for
10-fold cross validation scheme.
Nguyen, Khosravi, Creighton, and Nahavandi (2015a) presented
a medical classification model which combines wavelet transformation
and interval type-2 fuzzy logic system. These components
are combined together in order to deal with high dimensionality
and uncertainty properly. Interval type-2 fuzzy logic system consists
of fuzzy c-means clustering based unsupervised learning
and genetic algorithm based parameter tuning. These components
of logic system have high computational costs and wavelet transform
functions for reducing these computational costs. The proposed
classification model achieves a classification accuracy of
97.88% for breast cancer diagnosis.
Nguyen, Khosravi, Creighton, and Nahavandi (2015b) presented
a medical classification model based on fuzzy standard additive
model and genetic algorithm. In this scheme, rule initialization is
handled by the adaptive vector quantization clustering. A genetic
algorithm is utilized for rule optimization and gradient descent
algorithm is used for parameter tuning. In order to handle with
the high dimensional datasets, wavelet transform is utilized. For
this classification model, they reported a classification accuracy
of 97.40%.
Nahato, Harichandran, and Arputharaj (2015) have designed a
medical classification model which combines the rough set indiscernibility
relation method and the backpropagation neural network.
The indiscernibility relation method is utilized for handling
missing values and obtaining the appropriate feature subset.
They reported a classification accuracy of 98.60% for breast cancer
diagnosis.
Fuzzy set based methods have been successfully applied for
building robust classification systems in medical and other fields.
For instance, Ganji and Abadeh (2011) combined fuzzy logic and
ant colony optimization for diagnosis of diabetes disease. Liu
et al. (2012) designed an enhanced fuzzy K-nearest neighbor classification
algorithm for thyroid disease classification. In this algorithm,
the particle swarm optimization algorithm was utilized for
parameter tuning. Shang and Barnes (2013) developed a classification
model which combines the fuzzy-rough feature selection and
support vector machine algorithm. The combined model was
applied to Mars terrain image classification. Chen et al. (2013) presented
a hybrid intelligent classification model which consists of
fuzzy K-nearest neighbor and principle component analysis for
Parkinson’s disease diagnosis. Rodger (2014a) developed a statistical
model based on fuzzy nearest neighbor, regression and fuzzy
logic for improving energy costs savings in buildings. Rodger
(2014b) developed a fuzzy feasibility Bayesian probabilistic estimation
model for a supply chain. Lee, Anaraki, Ahn, and An
(2015) developed a classification model based on fuzzy-rough feature
selection and multi-tree genetic programming for intension
pattern recognition using brain signal.
Comparing the works of the literature on breast cancer diagnosis,
this paper differs from the other studies in terms of several
points. First, the presented classification model utilizes the
fuzzy-rough instance selection method in order to remove useless
or erroneous instances based on fuzzy-rough set based concept.
Though instance selection is an essential task in building accurate classification models, much of the research in the field of medical
diagnosis based on machine learning only takes feature selection
into account. Secondly, a recent search method, i.e. the
re-ranking search method is combined with consistency-based
subset evaluation for reducing the number of wrapper evaluations
in feature selection. Finally, an optimal training set is obtained via
fuzzy-rough instance selection and consistency-based feature
selection and this set is used to build a classification model based
on fuzzy-rough nearest neighbor classifier.
3. Methods
In this section, we briefly introduce the consistency-based subset
evaluation, the re-ranking algorithm, the fuzzy-rough instance
selection and the fuzzy-rough nearest neighbor classifier.
3.1. Consistency-based subset evaluation
Feature subset evaluation (feature selection) is an important
problem in machine learning which requires identifying an appropriate
set of features from which a classification model can be built
properly. Feature subset evaluation involves electing relevant features,
while eliminating the irrelevant features so that a minimal
feature set benefiting from improved classifier performance can
be obtained. Feature selection methods can be divided into two
main categories: methods based on evaluating individual features
and methods based on subsets of features. In individual feature
selection methods, an evaluation metric, such as information gain,
signal to noise statistic, correlation coefficient, t-statistic, and
chi-square statistic, is computed for each feature and a ranking
of attributes based on their individual evaluations is obtained.
Feature subset selection methods evaluate the worth of a subset
of attributes based on a predefined score. These methods are generally
used with conjunction to search algorithms in order to
search the space of attribute subsets effectively.
Correlation-based feature selection and consistency-based subset
evaluation are the two common types of feature subset selection
(Zheng & Kwoh, 2011).
Liu and Setiono (1996) proposed a probabilistic approach to feature
selection, which evaluates the worth of a subset of attributes
by the level of consistency in the class values. The
consistency-based subset evaluation method uses the consistency
metric given by Eq. (1) (Hall & Holmes, 2003):
Consistencys ¼ 1
PJ
i¼0jDijjMij
N ð1Þ
where s denotes an attribute subset, J represents the number of distinct
combinations of attribute values for s, |Di| represents the number
of occurrences of ith attribute value combination and |Mi|
represents the cardinality of the majority class for ith attribute
value combination and N denotes total number of instances in the
data set (Hall & Holmes, 2003).
The consistency-based subset evaluation method generates a
random subset, S, from the feature subset space (N) in every round
of the process. If the number of features © contained by S is less
than the current best subset, the inconsistency rate of data prescribed
in S is checked against the inconsistency rate of the current
best subset. If S has more or equal consistency with the best subset,
then the best subset is replaced by S. The general structure of the
consistency-based feature selection method is outlined in Table 1.
3.2. Re-ranking algorithm
Re-ranking is a meta-search algorithm for feature subset selection
which alters the ranking each time so that attributes become
redundant by the inclusion of the current content can be eliminated
and attributes that become necessary can be added to the
modified ranking (Bermejo, Ossa, Gamez, & Puerta, 2012). The
ranking obtained by the existing approaches for wrapper subset
selection may not be effectual to indicate the possible interactions
between the attributes, since it is based on univariate ranking of
first attributes. Hence, re-ranking algorithm aims to obtain a ranking
in a dynamic manner.
The general structure of re-ranking algorithm is outlined in
Table 2. It first creates a univariate ranking of all attributes in a
decreasing order based on the attribute evaluation metric, such
as information gain, symmetrical uncertainty. Then, the ranking
is divided into the blocks based on the block size parameter and
the attribute selection search algorithm is run for the first block.
The rest of the ranking is modified based on an appropriate approximation
metric of each attribute given the selected attributes so
far. Afterwards, attribute selection search algorithm is run again
for the first block and the process terminates when a new block
does not alter the selected subset. In this algorithm, any suitable
search algorithm can be utilized for searching attribute selection.
Besides, three different approximation methods are implemented
in order to approximate the re-ranking of remaining attributes in
the ranking. These approximation methods are the conditional
mutual information maximization, the mutual information based feature selection and the max-relevance and min-redundancy
(Bermejo et al., 2012).
3.3. Fuzzy-rough instance selection
Instance selection is a viable tool in the machine learning which
aims to reduce the number of instances in the learning set by
either extracting bad instances or extracting as much instances
as possible so that the noise in the original data set can be reduced
and reduction in the training time of learning algorithm can be
achieved (Jankowski & Grochowski, 2004).
Fuzzy-rough instance selection (FRIS) method removes
instances that cause conflicts with other instances based on the
fuzzy-rough positive region (Jensen, 2011). In this method,
instances that negatively affect the fuzzy positive region are
removed so that there is no uncertainty amongst the all remaining
objects. This, in turn, reduces the training time of classifier whilst
improving the quality of training data (Jensen & Cornelis, 2010).
Before giving the details for fuzzy-rough instance selection
approach, we should briefly explain the guiding principles for
fuzzy-rough sets here. The hybridization of fuzzy and rough sets
is mainly applied based on two approaches: the constructive
approach and the axiomatic approach (Jensen & Shen, 2008). In
the constructive approach, the generalized lower and upper
approximations are based on fuzzy relation, whereas the axiomatic
approach mainly deals with the mathematical properties of
fuzzy-rough sets (Wu & Zhang, 2004). The definition for fuzzy
P-lower and P-upper approximations can be given as follows
(Dubois & Prade, 1992; Jensen & Shen, 2008):
lPXðFiÞ ¼ infx max 1 lFi
ðxÞ; lxðxÞ
n o; 8i ð2Þ
lPXðFiÞ ¼ supx min lFi
ðxÞ;lxðxÞ
n o; 8i ð3Þ
where Fi denotes a fuzzy equivalence class and X is the fuzzy concept
to be approximated. The tuple ðPX; PXÞ is called fuzzy-rough
set.Fuzzy-rough instance selection is based on the use of information
in the positive region to determine the usefulness of each
instance. Let S X be a set of training examples. Given a decision
system ðX; A [ f gÞ d ; let a be a quantitative attribute in A [ f gd with
range lðaÞ. In order to identify the approximate equality between
two objects, the fuzzy relation Ra for x and y in S is defined as follows
(Jensen & Cornelis, 2010):
Ra
a ðx; yÞ ¼ maxð0; 1 a jaðxÞ aðyÞj
lðaÞ Þ ð4Þ
where a parameter determines the granularity of Ra
a . The lower
approximation of a fuzzy set A in S in terms of a fuzzy relation for
y 2 S is formulized as follows (Jensen & Cornelis, 2010):
ðRa
B#S
AÞðyÞ ¼ inf
x2S
s Ra
Bðx; yÞ; AðxÞ ð5Þ
Moreover, the fuzzy B-positive region POSa;S
B for y 2 S is formulized
as follows (Jensen & Cornelis, 2010):
POSa;S
B ðyÞ¼ðRa
B#S
Ra
dyÞðyÞ ð6Þ
The general structure of fuzzy-rough instance selection algorithm
(FRIS-1) is outlined in Table 3. It involves the set of objects to be
reduced, a parameter to be used in the fuzzy similarity measure
and the selection threshold parameter as input parameters. The
selection threshold value is generally assigned to 1. The algorithm
works as follows: It determines the degree of membership of each
object x to the positive region. If this membership degree is lower
than the assigned threshold value, then the object is removed
(Jensen & Cornelis, 2010):