Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Document clustering using hadoop
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstracts: Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW).With the fast growing number of works utilizing link information in enhancing unsupervised document clustering, it is becoming necessary to make a comparative evaluation of the impacts of different link types on document clustering. Various measures for clustering the documents are very much important so as to increase the efficiency of the search engines. In this project we adopt K-Means clustering algorithm for Document clustering. It organizes all the patterns in a k-d tree structure such that one can find all the patterns which are closest to a given prototype efficiently. This approach can be applied recursively until the size of the candidate set is one for each node. We have also used different similarity measures for clustering the documents and compared those giving different values of threshold.