Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Clustering Microarray Data using Fuzzy Clustering with Viewpoints
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract—This paper studies the application of fuzzy clustering
with viewpoints in order to cluster cell samples according to their
gene expression profile. This method combines fuzzy clustering
with external domain knowledge represented by the so-called
viewpoints. The viewpoints that we employ are obtained from
previously available expression data. The method was compared
to the clustering algorithms of k-means, fuzzy c-means, affinity
propagation, as well as a method of clustering microarray data
that is based on prior biological knowledge, and has shown
comparable/improved results over them.
Keywords—Clustering; microarray data; prior knowledge;
viewpoints
I. INTRODUCTION
Clustering gene expression data can be used to find genes
with similar patterns of expression, in order to identify through
this process the most representative genes to be further studied
[1]. In addition, clustering can be applied to group cell samples
of different conditions according to their expression profile.
This explorative clustering process could be used to identify
different subtypes of conditions, for example different cancer
subtypes [2]. It could also be used to aid the labeling of
unknown samples.
Performing clustering in microarrays is a challenging
process due to the nature of the datasets used. One reason is the
inherent noise in the data [3]. Especially for the case of
clustering samples, data consist of rather a few numbers of
samples, with each sample being described by a highdimensional
vector of gene measurements. This setup requires
carefully designed methods in order to effectively perform the
clustering process.
A concept that has been explored in data clustering is the
incorporation of prior domain knowledge in the clustering
process, resulting in methods that are semi-supervised in nature
[4]. This approach has also been employed in the clustering of
microarray data, for example in [5] and [6], in which
previously obtained biologically-related knowledge was taken
into account in order to improve the clustering process. In [6]
the fuzzy c-means clustering algorithm is used in combination
with Gene Ontology annotations, which enables to obtain
clusters of functionally related genes, according to similarity in
patterns of expressions, which is associated with common
functional behavior. The approach of clustering samples from
microarray experiments using a priori information has been
followed in [7]. This method uses certain pre-defined classes of
genes to guide the clustering process, which present significant
relationship with the sample classes and that can be obtained
for example from Gene Ontologies. Another method to cluster
samples according to their microarray expression has been
presented in [8]. The algorithm is based on finding groups of
genes that are co-regulated in a way that is associated with the
sample classes. The supervised clustering algorithm uses a
measure of similarity among the gene attributes which is based
on mutual information.
In this work, we employ a fuzzy clustering approach to
cluster microarray data that uses prior domain knowledge. The
method used is fuzzy clustering with viewpoints, developed by
Pedrycz et al [9]. In this method, the researcher can impact
directly on the cluster centers, using previously obtained
knowledge. We employed this method in order to perform
supervised clustering of cancer samples from various tissues,
according to their microarray expression profiles. The purpose
is to explore the different subtypes of conditions that could be
assigned to these samples. Overall, the goal is to enable the
construction of prediction models to aid in the identification of
unlabeled samples, especially in multiclass problems. By
making use of prior knowledge in the form of viewpoints, we
can incorporate in the predictive processes the characteristics
that we expect for the results to have.