25-08-2012, 10:12 AM
CLUSTERING USING WAVELETS
Clustering(Wavelet).ppt (Size: 1.69 MB / Downloads: 33)
INTRODUCTION
Clustering of large spatial database is an important problem.
It is difficult to find the densely populated region in the feature space, knowledge discovery and efficient information retrieval.
A good clustering approach must have some features –
1. Efficient and can detect clusters of arbitrary shape
2. Insensitive to outliers(Noise)
3. Insensitive of order of input data
A clustering method based on wavelet transforms, satisfies the three requirements.
CLUSTERING
Most important unsupervised learning problem.
Process of organizing objects into groups whose members are similar in some way.
So, clustering is collection of objects which are similar.
Now the similarity criteria may be distance between objects, density of the objects.
Spatial DATA vs. Multidimensional signal
The idea of applying signal processing in spatial database comes from multidimensional spatial data objects.
Represented as n-dimensional feature space.
Numerical feature of spatial objects can be represented by a feature vector.
Clustering of data identifies the sparse and dense places.
Clustering discovers the overall distribution of patterns of the feature vectors.
WAVE CLUSTER
A multi-resolution clustering approach which applies wavelet transform to the feature space
A wavelet transform is a signal processing technique that decomposes a signal into different frequency sub-band.
Grid-based
Input parameters:
1. # of grid cells for each dimension
2. The wavelet, and the # of applications of wavelet transform.
WAVELET TRANSFORM
Decomposes a signal into different frequency sub bands.
Allows natural clusters to become more distinguishable.
Basis functions of the wavelet transform (WT) are small waves located in different times.
The WT is localized in both time and frequency
WT provides a multiresolution system, is useful for different application in image processing.
WAVELET BASED CLUSTERING
The collection of objects in the feature space composes an n-dimensional signal.
The high frequency parts of the signal correspond to the regions of the feature space where there is a rapid change in the distribution of objects, that is the boundaries of clusters.
The low frequency parts of the n-dimensional signal which have high amplitude correspond to the areas of the feature space where the objects are concentrated, in other words the clusters themselves.
CONCLUSION
Complexity O(N).
Comparatively better than other algorithms used for large spatial database.
Detect arbitrary shaped clusters at different scales.
Not sensitive to noise.
Not sensitive to input order.
Efficient on large spatial database.