16-01-2013, 02:27 PM
Document clustering and summarization based on Association rule mining for dynamic environment
Document clustering and summarization.ppt (Size: 176 KB / Downloads: 21)
INTRODUCTION
Document Cluster is a set of similar documents and automatic grouping of text documents is called Document Clustering.
The documents within the clusters should have high similarity when compared to the documents in the other clusters.
Document clustering is widely applicable in areas such as web mining, information retrieval, search engines and topological analysis
MOTIVATION
Our hierarchical clustering based summarization algorithm can extract the concepts with terms, related terms, based on association rule mining.
Extraction of terms, related terms and terms inferred through association rule is denoted as concepts.
for example
{Computer networks}=> Intranet, Ethernet, LAN, WAN, topology, protocol,
Architecture, OSI layers (related terms)
+QoS, Security (association rule).
And then cluster the concepts in a hierarchical tree structure for easy retrieval of documents.
Concept based clustering will improves the quality of the cluster.
PROGRESS AFTER THE ZEROTH REVIEW
Study of association rule based literature survey.
Datasets collected for scientific literature and newstrack .
Design of Proposed work.
Study of Meadv3.10 summarization tool and installing in Linux OS.
Study of existing system.
PROPOSED WORKS
We propose a new hierarchical clustering based summarization algorithm incorporating association rule mining for dynamic environment.
In our algorithm, we are extracting the concepts with terms, related terms based on association rule mining.
Extracting concepts based on association rule, improves the quality of the cluster.
Inclusion of feature and association rule for extracting and helps the user to form efficient summary.
The dataset used for our experimental setup contains 500 abstract collected from the science direct digital library.
CONCLUSION
From the study it is inferred that inclusion of terms only for clustering and summarization degrades the quality of the clusters.
Our proposed algorithm extract concepts with terms,related terms incorporating association rule which in turn helps to improve the quality of the cluster and summary.