13-11-2012, 01:55 PM
A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
Abstract
Although attempts have been made to solve the problem
of clustering categorical data via cluster ensembles, with
the results being competitive to conventional algorithms,
it is observed that these techniques unfortunately
generate a final data partition based on incomplete
information. The underlying ensemble-information
matrix presents only cluster-data point relations, with
many entries being left unknown. The paper presents an
analysis that suggests this problem degrades the quality
of the clustering result, and it presents a new link-based
approach, which improves the conventional matrix by
discovering unknown entries through similarity between
clusters in an ensemble. In particular, an efficient linkbased
algorithm is proposed for the underlying
similarity assessment. Afterward, to obtain the final
clustering result, a graph partitioning technique is
applied to a weighted bipartite graph that is formulated
from the refined matrix. Experimental results on
multiple real data sets suggest that the proposed linkbased
method almost always outperforms both
conventional clustering algorithms for categorical data
and well-known cluster ensemble techniques.