30-09-2016, 11:08 AM
1456988986-phdproposal2.docx (Size: 13.35 KB / Downloads: 3)
Introduction
Clustering is one of the most important dataMining or TextMining algorithm which is used to group similar objects together.Clustering can be done for images, patterns, words and documents.Clustering has many different techniques to process.The most important two techniques are Partitional Clustering, Hierarichal Clustering.
Partitional Clustering is a centre-Based partitional clustering.There are many approaches to perform this clusters such as K-means, K-centroid etc., . K-means clustering is a method of cluster analysis. It aims to partition ‘n’ observations into K clusters in which each observation belongs to the cluster. The main purpose of the K-means algorithm to minimize the distance,as per Euclidian measurement, between objects in clusters.
Hierarichal clustering is a connective based clustering. It categorizes clusters into a tree of clusters or hierarichy structure is maintained. Hierarichal clustering is sub divided into Agglomerative-Divisive method. The former, Agglomerative uses a Bottom-Up approach starting with one point of cluster and recursively merging two or more clusters to one parent cluster unti the termination criteria is reached. The later, Divisive method is a Top-Down approach starting with one cluster of all objects and splitting each cluster into number of clusters until the termination criteria is reached.
Existing system
The existing methods uses only single view point for measuring similarity between objects.single viewpoint has some disadvantages such as it cannot exhibit the complete set of relationships among objects. To overcome this disadvantages some researchers in previous papers they introduced a new similarity measure known as Multi-View point based similarity measure. To ensure that the cluster show all relationships among objects. There have been many clustering algorithms published every year by many researches.In previous papers they used K-Means algorithm to find the similarity measure in between the objects using Multi view point. The measure taken as Euclidian distance measure which is used in k-means algorithm . The main purpose is to minimise the distance between objects in clusters. The another measure is Cosine similarity which is widely used measurement for finding document similarity but which are not focussed on high dimensional and sparse data . The difference between K-means that uses euclidian distance and the K-means that make use of Cosine similarity is that the former mainly focused on vector magnitudes while the later focused on vector directions.
Proposed System
I need to introduce an improved Hiearichal clustering with multiview point based on similarity measure ,where I can achieve more informative on similarity and compare our approach with former models on various document collections to produce high quality clusters and provides maximum efficiency and performance in time and space allocation.