Correlation Based Multi-Document Summarization for Scientific Articles and News Group

**seminar tips** · 15-01-2013, 02:15 PM

Correlation Based Multi-Document Summarization for Scientific Articles and News Group

.pdf

Correlation Based Multi-Document.pdf (Size: 654.43 KB / Downloads: 43)
ABSTRACT

Automated information retrieval systems are used to reduce the overload of document retrieval. There is a need to provide high quality summary in order to allow the user to quickly locate the desired information. This paper proposes a new summarization technique which considers correlated concepts i.e. terms and related terms as concepts for concept based document summarization. Related documents are grouped into same cluster by Bisecting k-means clustering algorithm. From each cluster important sentences are extracted by concept matching and also based on sentence feature score. Also we adopt a modified redundancy elimination technique which is purely based on concepts rather than terms. Experiments are carried to analyze the performance of the proposed work with the existing term based and synonyms and hypernyms based summarization techniques considering scientific articles and news tracks as data set. From the analysis it is inferred that our proposed technique gives better enhancement for the documents related to scientific terms.

INTRODUCTION

Now-a-days online submission of documents has increased widely, which means large amount of documents are accumulated for a particular domain dynamically. Information retrieval [1] is the process of searching information within the documents. An information retrieval process begins when a user enters a query; queries are formal statements of information needs, for example search strings in web search engine. In the process of information retrieval, a query does not uniquely identify a single object in the collection. Instead, several objects may match the query perhaps with different degrees of relevancy. Hence user has to visit each and every page for the required information, which is time a consuming process.

RELATED WORK

AditiSharan, et.al [5] proposed a semantic based document clustering using Wordnet ontology. The main aim of this is to replace the words with possible concept. This technique takes the nouns from all the documents forming the master noun list. The depth of each word is calculated by weighing the words. Then all possible combination of words is created and the pairs below the threshold are deleted from the pair list. The semantic similarity measure is used to find the maximum similarity to replace the term with the concept and the documents are clustered based on extracted concepts. But the experimental result shows that it does not consider all possible conditions.
Anna Huang, et.al [6] proposed a document clustering technique based on concept extraction using semantic relations. This work computes the similarity measure between the terms instead of considering the overlap between the terms as in the previous work. This process is achieved in 3 steps: identifying candidate phrases in the document and mapping them to anchor text in Wikipedia; disambiguating anchors that relate to multiple concepts; and pruning the list of concepts to filter out those that do not relate to the document‟s central thread.

Clustering Algorithm
The extracted concepts are clustered by induced Bisecting
K-means algorithm [13]. The steps in basic Bisecting Kmeans
algorithm [14] starts by selecting the elements with
largest distance as seed clusters and other items are
assigned to the closest seed. Then the center for these two
seeds are calculated by weighted sum of all items needed
and this center is used to find the new seeds. This process is
repeated until two seeds meet the predefined precision. If
the seed size is larger than the predefined threshold then the
entire process is repeated and this forms the binary tree.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	A TECHNICAL SEMINOR REPORT ON EYE-MOVEMENT BASED HUMAN-COMPUTER INTERACTION	study tips	1	1,101	14-09-2017, 09:49 AM Last Post: jaseela123
	Case Based Reasoning System	presentation Abstract	1	653	06-09-2017, 03:15 PM Last Post: jaseela123
	Computer-Based Information System	seminar tips	1	1,021	06-09-2017, 01:00 PM Last Post: jaseela123
	OBJECT ORIENTED ANALYSIS AND DESIGN TWO MARK AND SIXTEEN MARK Q and A	seminar ideas	1	1,982	29-08-2017, 11:23 AM Last Post: jaseela123
	Attendance System Applied in Classroom Based on Face Image	dhanabhagya	0	640	25-08-2017, 09:32 PM Last Post: dhanabhagya
	A Multi-Dimensional Approach to Internet Security	Electrical Fan	0	9,063,061	25-08-2017, 09:32 PM Last Post: Electrical Fan
	UTHENTICATION BASED ON IRIS RECOGNITION	computer science crazy	0	6,940,774	25-08-2017, 09:32 PM Last Post: computer science crazy
	compositional adaptation based on optimizing the global distance function and its app	electronics seminars	0	9,468,489	25-08-2017, 09:32 PM Last Post: electronics seminars

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.