19-12-2012, 06:26 PM
Clustering Techniques
1Clustering.pdf (Size: 2.22 MB / Downloads: 23)
What is Cluster Analysis?
Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
Applications of Cluster Analysis
Understanding
◦ Group related
documents for browsing,
group genes and
proteins that have
similar functionality, or
group stocks with
similar price fluctuations
Summarization
◦ Reduce the size of large
data sets
Introduction
Clustering is useful technique for the discovery of data distribution and patterns in the underlying data.
Cluster is a collection of data objects
•Similar to one another in similar cluster
•Dissimilar to the objects in other clusters
Cluster Analysis:
Grouping a set of data objects into clusters
Clustering is unsupervised classification: no predefined classes
Typical applications:
•As a stand alone tool to get insight into data distribution
•As a preprocessing step for other algorithm
What is good clustering
A good clustering method will produce high quality clustes with
◦High intra class similarity
◦Low inter class similarity
The quality of clustering result depends on both the similarity measure used by the method and its implementation
The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns
Requirements of clustering
Scalability
Ability to deal with different types of attributes
Discovery of clusters with arbitrary shape
Able to deal with noise and outliers
Insensitive to order of input records
High dimensionality
Incorporation of user specified constraints
Partitioning Method
The partitioning clustering technique partition the database into predefined number of clusters.
They attempt to determine k partitions that optimize a certain criterion function.
It construct partitions of a database of N objects into a set of k clusters
The construction involves determining the optimal partition a set of N data points into k subsets.