03-05-2012, 02:58 PM
An Efficient Density based Improved K- Medoids Clustering algorithm
An Efficient Density based Improved K- Medoids.pdf (Size: 390.08 KB / Downloads: 82)
INTRODUCTION
Numerous applications require the management of spatial data, i.e. data related to space. Spatial Database Systems (SDBS) (Gueting 1994) are database systems for the management of spatial data. Increasingly large amounts of data are obtained from satellite images, X-ray crystallography or other automatic equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases.
Advantages of DBSCAN
DBScan requires two parameters: epsilon (eps) and minimum points (minPts). It starts with an arbitrary starting point that has not been visited. It then finds all the neighbor points within distance eps of the starting point.
If the number of neighbors is greater than or equal to minPts, a cluster is formed. The starting point and its neighbors are added to this cluster and the starting point is marked as visited. The algorithm then repeats the evaluation process for all the neighbors’ recursively.
If the number of neighbors is less than minPts, the point is marked as noise.
If a cluster is fully expanded (all points within reach are visited) then the algorithm proceeds to iterate through the remaining unvisited points in the dataset.
Disadvantages of DBSCAN
DBScan requires two parameters: epsilon (eps) and minimum points (minPts). It starts with an arbitrary starting point that has not been visited. It then finds all the neighbor points within distance eps of the starting point.
DBSCAN cannot cluster data sets well with large differences in densities, since the minPts-epsilon combination cannot be chosen appropriately for all clusters then
EVALUATION AND RESULTS
Metrics Used For Evaluation
In order to measure the performance of a clustering and classification system, a suitable metric will be needed. For evaluating the algorithms under consideration, we used Rand Index and Run Time as two measures
A. Performance in terms of time
We evaluated the three algorithms DBSCAN, k-medoid and DBkmedoids in terms of time required for clustering. The Attributes of Multidimensional Data:
CONCLUSION
This Clustering is an efficient way of reaching information
from raw data and Kmeans, Kmedoids are basic methods for it.
Although it is easy to implement and understand, Kmeans and
Kmedoids have serious drawbacks. The proposed clustering
and outlier detection system has been implemented using Weka
and tested with the proteins data base created by Gaussian
distribution function. The data will form circular or spherical
clusters in space. As shown in the tables and graphs, the
proposed Density based Kmedoids algorithm performed very
well than DBSCAN and k-medoids clustering in term of
quality of classification measured by Rand index. One of the
major challenges in medical domain is the extraction of
comprehensible knowledge from medical diagnosis data.