19-11-2012, 05:53 PM
Analysis of semi supervised learning methods towards multi label text classification
Analysis of semi supervised learning methods.pdf (Size: 271.99 KB / Downloads: 36)
ABSTRACT
The area of multi label text classification is getting more attention of researchers because of its role in the field of information retrieval , text mining , web mining etc. Supervised methods from machine learning are mainly used for its realization. But as it needs labeled data for classification all the time , semi supervised methods are now a day getting popular in the MLTC domain. The goal of Semi supervised learning is to reduce the classification errors using readily available unlabeled data in conjunction with available labeled data. This paper mainly provides survey and analysis of various semi supervised methods used in multi label text classification task ; This overview concludes that consideration of semantic aspects of input document datasets , their representation in conjunction with smoothness and manifold assumptions in semi supervised learning may give more relevant classification results.
INTRODUCTION
Recently the area of Multi label text classification has attracted significant attention from lot of researchers, as playing a crucial role in many applications such as web page classification , classification of news articles, information retrieval etc[6].Generally Supervised methods are used in working principle of multi label classification. But in real practice availability of labeled data is rare and that of unlabeled data is plenty [9]. Major limitation of existing supervised algorithms for multi label text classifiers is that they need labeled training data to learn accurately[9][10]. But acquisition of labeled training data is not as easy as that of getting unlabeled data. We need human intervention to label the given text document which is not only time consuming but error prone also [14]. This demands other sources of information that can reduce the need for labeled data. So now a day many researchers are looking towards semi supervised learning as promising solution to the give problem.
OVERVIEW OF MULTI LABEL TEXT CLASSIFICATION
The goal of text classification system is to determine the correct class of a new text document based on some training examples. Thus consideration of semi supervised machine learning method for building text classifier is an interesting area for research. Some of the research in the area of text classification focuses on some specific properties of text data. One such a property is its multi-labelity [3]. Multi-label text classification system is one key domain in this research area. Multi-label classification studies the problem in which a data instance can have multiple labels [4]. Semi supervised methods for text classification is also present in the literature. But very few techniques are available for solving multi-label text classification problem.
Expectation Maximization (EM) based text classification.
Nigam and Mccallum [9] developed this algorithm in 1999. It was very popular attempt to introduce semi supervised learning for text document classification. In this technique the authors have proposed updation in the basic EM technique by considering unlabeled data as incomplete data as it is coming without labels. EM is a class of iterative algorithms for max. Likelihood or max. a posteriori estimation in problems with incomplete data.