01-12-2012, 05:00 PM
Adaptive Cluster Distance Bounding for High Dimensional Indexing
Adaptive Cluster.ppt (Size: 416.5 KB / Downloads: 38)
Abstract
With developments in semiconductor technology and signal processing tools storage media have become cheaper and the size of databases became larger. In the future, there is a need for organizations to store, retrieve and process petabytes of data, for various purposes such as data mining. But existing applications suffer from “ Curse of Dimensionality “ and show poor efficiency in exact nearest neighbor search. A New cluster adaptive distance bound and new indexing techniques are proposed to overcome the problems in existing applications.
Existing System
It suffers from “curse of dimensionality”
Requires Large preprocessing storage overhead
Computational costs are high
Not scalable when data set size increases
Proposed System
Vector approximation (VA-File) which is a new indexing technique was proposed to combat the “ Curse of dimensionality”
Voronoi Clusters were created to complement cluster based index
Focused on the clustering paradigm for search and retrieval of data
Scalable for Real data sets
Performance costs are better
Modules
A New Cluster Distance Bound :
Voronoi clusters which are able to efficiently bound the cluster surface are created by the Euclidean distance measure. By projection onto these hyperplane boundaries and complementing with the cluster-hyperplane distance, an appropriate lower bound on the distance of a query to a cluster is developed.
Adaptability to Weighted Euclidean or Mahalanobis Distances :
The Mahalanobis distance measure has more degrees of freedom than the Euclidean distance We extend our distance bounding technique to the Mahalanobis distance metric, and note large gains over existing indexes .
Conclusion
Curse of dimensionality is resolved
Adapted to new cluster distance bound measure ( Mahalanobis )
Proposed System has a low computational cost and scaled well with all dimensions and size of data set
Future Scope
Future efforts are made towards optimizing clustering algorithm to optimize cluster distance bounds and new techniques are designed to achieve more tightness in cluster-distance bounds.