Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: A Communication Perspective on Automatic Text Categorization
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract”
The basic concern of a Communication System is to transfer information from its source to a destination some distance away. Textual documents also deal with the transmission of information. Particularly, from a text categorization system point of view, the information encoded by a document is the topic or category it belongs to. Following this initial intuition, a theoretical framework is developed where Automatic Text Categorization (ATC) is studied under a Communication System perspective. Under this approach, the problematic indexing feature space dimensionality reduction has been tackled by a two-level supervised scheme, implemented by a noisy terms filtering and a subsequent redundant terms compression. Gaussian probabilistic categorizers have been revisited and adapted to the concomitance of sparsity in ATC. Experimental results pertaining to 20 Newsgroups and Reuters-21578 collections validate the theoretical approaches. The noise filter and redundancy compressor allows an aggressive term vocabulary reduction (reduction factor greater than 0.99) with a minimum loss (lower than 3 percent) and, in some cases, gain (greater than 4 percent) of final classification accuracy. The adapted Gaussian Naive Bayes classifier reaches classification results similar to those obtained by state-of-the-art Multinomial Naive Bayes (MNB) and Support Vector Machines (SVMs).


Marta Capdevila and Oscar W. Ma´rquez Flo´ rez

read full report
http://portal.acmcitation.cfm?id=1591899.1592307
http://www.aclwebanthology/J00-4001.pdf