24-11-2012, 01:16 PM
Classification and Retrieval of Research Papers: A Semantic Hierarchical Approach
Classification and Retrieval.pdf (Size: 3.69 MB / Downloads: 72)
Abstract
„Classification and Retrieval of Research papers: A Semantic Hierarchical Approach‟ demonstrates an effective and efficient technique for classification of Research documents pertaining to Computer Science. The explosion in the number of documents and research publications in electronic form and the need to perform a semantic search for their retrieval has been the incentive for this research.
The popularity and the widespread use of electronic documents and publications, has necessitated the development of an efficient document archival and retrieval mechanism. Categorizing journal papers by assigning them relevant and meaningful classes, predicting the latent concept or the topic of research, based on the relevant terms and assigning the appropriate Classification labels is the objective of this thesis.
This thesis takes a semantic approach and applies the text mining techniques in a hierarchical manner in order to classify the documents.
The use of a lexicon containing domain specific terms (DSL) adds a semantic dimension to classification and document retrieval. The Concept Prediction based on Term Relevance (CPTR) technique demonstrates a semantic model for assigning concepts or topics to papers.
This Thesis proposes a conceptual framework for organizing and classifying the research papers pertaining to Computer Science. The efficacy of the proposed concepts is demonstrated with the help of Classification experiments. Classification experiments reveal that the DSL technique of training works efficiently when categorization is based on keywords. The CPTR technique, on the other hand, shows very high accuracy; when the classification is based on the contents of the document.
Introduction
Text Analytics or Mining textual data in order to extract hidden patterns from semi structured text has become vital with the popularity of the World Wide Web and the increase in the number of electronic documents and publications.
Classification of Documents involves assigning class labels to documents indicating their category. Categorizing documents enables a semantic search and retrieval of documents. Meaningful classification can be achieved by using Machine learning techniques along with domain specific and Concept based Lexicon.
The primary objective of this research titled „Classification and Retrieval of Research Papers: A Semantic Hierarchical Approach‟ is to assign a classification label that specifies the Domain, Sub domain and the underlying concept of the paper. This is done using the Text Mining techniques. The approach taken is hierarchical. The levels of hierarchy (3 levels) are based on the structure of the research document. At each level of hierarchy the mining techniques are applied/ restricted to only specific contents of the document.
The proposed classification framework enables a fast, accurate and meaningful search and retrieval.
The approach taken is hierarchical. The levels of hierarchy (3 levels) are based on the structure of the research document. At each level of hierarchy the mining techniques are applied/ restricted to only specific content of the document. This approach helps to limit the scope of mining to a limited section of documents.
Related Works/ Background:
Several Research papers have been published in the recent years in the area of Text / Document classification. These papers provide guidelines for improving the efficiency of classification. Most of the recent work in this area suggests the use of Domain specific Ontology to achieve meaningful classification.
Selection of features to create vector space, improves the scalability, effectiveness and accuracy of a text classifier. A good feature selection method should consider domain[12].
The statistical techniques are not sufficient for the text mining. Better classification will be performed when the semantics are considered[13].