27-02-2013, 12:57 PM
Balanced Iterative Reducing and Clustering using Hierarchies
Balanced Iterative Reducing.ppt (Size: 975 KB / Downloads: 101)
INTRODUCTION
Data clustering concerns how to group a set of objects based on their similarity of attributes and/or their proximity in the vector space.
Main methods
Partitioning : K-Means…
Hierarchical : BIRCH,ROCK,…
Density-based: DBSCAN,…
A good clustering method will produce high quality clusters with
high intra-class similarity
low inter-class similarity
Main Techniques (2) Hierarchical Clustering
Multilevel clustering: level 1 has n clusters level n has one cluster, or upside down.
Agglomerative HC: starts with singleton and merge clusters (bottom-up).
Divisive HC: starts with one sample and split clusters (top-down).
Clustering Feature
The Birch algorithm builds a dendrogram called clustering feature tree (CF tree) while scanning the data set.
Each entry in the CF tree represents a cluster of objects and is characterized by a 3-tuple: (N, LS, SS), where N is the number of objects in the cluster and LS, SS are defined in the following.
CF-Tree Rebuilding
If we run out of space, increase threshold T
By increasing the threshold, CFs absorb more data
Rebuilding "pushes" CFs over
The larger T allows different CFs to group together
Reducibility theorem
Increasing T will result in a CF-tree smaller than the original
Rebuilding needs at most h extra pages of memory