Document Clustering using Rough K-means Algorithm Seminar Report

**seminar code** · 01-09-2014, 03:37 PM

Document Clustering using Rough K-means Algorithm

.docx

Document Clustering.docx (Size: 15.02 KB / Downloads: 14)

Abstract

Clustering is an automatic learning technique aimed at grouping a set of objects into subsets or clusters. The goal of clustering is to group similar objects in one cluster and dissimilar objects in different clusters. The K-means clustering is characterized by non-overlapping, clearly separated clusters with bivalent memberships: an object either belongs to or does not belongs to a cluster.
However many real life applications are characterized by situation where overlapping clusters would be a more suitable representation. Soft clustering mechanisms enable such representation as they allow an object to belong to overlapping clusters. Document clustering aims to cluster documents based on the similarity of concepts they are associated with. Document clustering is widely applied in the areas of web mining for clustering web pages, query results etc.
Soft clustering is relevant for document clustering. In our project we investigate the applicability of Rough K-means algorithm, a soft clustering technique based on rough set principles, for document clustering. The objective includes implementation of Rough K-means algorithm and performing document clustering using Rough K-means and K-means algorithms on benchmark datasets for comparative analysis

Work Done

Word done so far is summarized below:
1. Literature survey of soft clustering methods and in particular Rough K-means algorithm.
2. Pre-processing of documents and vector space modeling of data:
a. Stop-word removal.
b. Stemming.
c. Finding term frequency and inverse document frequency (tf-idf).
3. Clustering of objects using K-means algorithm

Future Work

1. Implementation of Rough K-means algorithm
2. Vector space representation of documents based on tf-idf values
3. Comparative study of K-Means and Rough K-means algorithms on benchmark datasets

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Biometrics Security System Full Download Seminar Report and Paper Presentation	computer science crazy	30	190,561,110	24-02-2021, 08:13 AM Last Post: buy cialis generic
	Ultrasonic Trapping In Capillaries For Trace-Amount Bi (Download Full Seminar Report)	Computer Science Clay	2	104,277,107	17-01-2018, 11:59 AM Last Post: dhanabhagya
	nanorobotics full report	project topics	24	176,551,278	16-01-2018, 05:50 PM Last Post: Guest
	robotic surgery full report	project report tiger	16	150,961,205	07-01-2018, 07:28 PM Last Post: Raymondnof
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Amoeba full report	project topics	1	1,631,984	22-09-2017, 10:38 AM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.