A Novel Similarity Measure for Clustering Categorical Data Sets

seminar girl · 18-08-2012, 03:08 PM

A Novel Similarity Measure for Clustering Categorical
Data Sets

.pdf

A Novel Similarity Measure for Clustering.pdf (Size: 257.15 KB / Downloads: 67)

INTRODUCTION

We Data clustering has attracted a lot of research attention in the
field of computation statistics and datamining. The clustering
techniques can be applied and used to perform similarity clusters
and search, pattern recognition, trend analysis and so forth.
Clustering [10] is the technique of grouping a set of physical or
abstract objects into different clusters, such that objects with in a
cluster are more similar to one another and are dissimilar to the
objects in other clusters. A good clustering algorithm generates
high quality clusters to yield low inter cluster similarity and high
intra cluster similarity.

RELATED WORK

This section deals with all categorical clustering algorithms and
their similarity approaches in finding the best of clusters and
also deals with the previous work of finding and dealing
similarity between one set of attributes with respect to other set
known as context based similarity. Most of the earlier work has
been done with k-means as the stepping platform to generate
clusters on categorical attributes.

K-Representative Algorithm

K-modes algorithm [14] has its own set of drawbacks because
of its instability due to non-uniqueness of the modes i.e., the
results of the clusters depend largely and strongly on the
selection of modes during the clustering process. Huang
combined k-modes with k-means to give k-prototype algorithm
[15] but because of the K-mode problem, limitations remained
same. K-representative algorithm, [7] works on the principal of
“cluster centers” called representatives for categorical objects.
Arithmetic operations are completely absent in the initialization
and setting of categorical objects, it applies the notion of fuzzy
logic in defining representatives instead of means for clusters.
With this theory, it can formulate the clustering problem of
categorical objects as a partitioning problem in the way similar
to k-means clustering. The dissimilarity measure of this
algorithm is as follows.

CLOPE Algorithm

CLOPE, Clustering with sLOPE, algorithm [12] proposes an
approach based on histograms: The goodness of a cluster is
higher if the average frequency of an item is high, as compared
to the number of items appearing within a transaction. The
algorithm is particularly suitable for large high- dimensional
databases, but it is sensitive to a user-defined parameter (the
repulsion factor), which weights the importance of the
compactness/sparseness of a cluster. A better cluster is reflected
graphically if higher height to weight ratio is achieved. CLOPE
uses histograms of a cluster C with items as the X –axis
decreasingly ordered by their occurrences and occurrences as yaxis.
A larger height means a heavier overlap among the items in
the cluster and thus more similarity among transactions in the
cluster.

CONCLUSION AND FUTURE WORK
In this paper, a novel similarity measure for categorical
attributes of relational data sets has been proposed based on the
intuitive idea of functional dependency and Context based
similarity. The idea is generalized with a functional dependency
that also uses context based of transactions in a cluster, and thus
the resulting number of clusters. Our application shows that this
similarity measure is quite effective in finding interesting
clustering of relational data sets.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	DEMONSTRATING DATAPOSSESSION AND UN CHEATABLE DATA TRANSFER	seminar flower	1	1,466	19-09-2017, 11:05 AM Last Post: jaseela123
	Processing of collected data PPT	seminar projects maker	1	718	15-09-2017, 12:48 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Data Warehouse Report	study tips	1	879	12-09-2017, 12:23 PM Last Post: jaseela123
	Pattern Recognition and (Fuzzy Sets in Pattern Recognition) ppt	project girl	1	918	11-09-2017, 04:19 PM Last Post: jaseela123
	REAL TIME FACIAL EXPRESSION RECOGNITION USING A NOVEL METHOD	seminar tips	1	1,061	09-09-2017, 04:43 PM Last Post: jaseela123
	CONFIDENTIAL DATA STORAGE AND DELETION details	seminar ideas	1	1,668	06-09-2017, 01:23 PM Last Post: jaseela123
	A Privacy-Preserving Remote Data Integrity Checking Protocol	seminar ideas	1	2,350	06-09-2017, 12:31 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.