16-04-2013, 04:55 PM
Movie Rating and Review Summarization in Mobile Environment
Movie Rating and Review.doc (Size: 1.48 MB / Downloads: 20)
Abstract
In this paper, we design and develop a movie-rating and review-summarization system in a mobile environment. The movie-rating information is based on the sentiment-classification result. The condensed descriptions of movie reviews are generated from the feature-based summarization. We propose a novel ap- proach based on latent semantic analysis (LSA) to identify product features. Furthermore, we find a way to reduce the size of summary based on the product features obtained from LSA. We consider both sentiment-classification accuracy and system response time to design the system. The rating and review-summarization system can be extended to other product-review domains easily.
INTRODUCTION
EOPLE’s opinion has become one of the extremely impor- tant sources for various services in ever-growing popular social networks. In particular, online opinions have turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities, and manage their reputa- tions. Meanwhile, cellular phones have definitely become the most-vital part of our lives. There is no doubt that the mobile platform is currently one of the most popular platforms in the world. However, digital content displayed in cellular phones is limited in size, since cellular phones are physically small. Hence, a mechanism that can provide users with condensed descriptions of documents will facilitate the delivery of digi- tal content in cellular phones. This paper explores and designs a mobile system for movie rating and review summarization in which semantic orientation of comments, the limitation of small display capability of cellular devices, and system response time
are considered.
Feature-Based Summarization
In product-review summarization, people are interested in the reasons why this product is worth buying rather than the principal meaning of the comment. Thus, feature-based sum- marization [6] is used in movie-review summarization. The feature-based summarization will focus on the product features on which the customers have expressed their opinions
Sentiment Classification
As mentioned above, sentiment classification is similar to traditional binary-classification problem. Currently, many clas- sification algorithms such as SVM [1], [10], [18], [19], decision trees [20], and neural networks [21] have been proposed and shown their capabilities in different domains. SVM is one of the state-of-the-art algorithms. SVM has been shown to be highly effective in traditional text categorization. SVM measures the complexity of hypotheses based on the margin with which they separate the data instead of the number of features. One re- markable property of SVM is that their ability to learn can be independent of the dimensionality of the feature space.
In natural-language processing (NLP) and information re- trieval (IR), bag-of-words model tries to use an unordered col- lection of words to represent a text, disregarding grammar and even word order. In other words, each word in the text con- tributes to a feature of the document. In this paper, we employ similar approach to construct a feature vector of the document. Stop words are removed first and then each distinct word Wi in the document is used to represent a feature. As a result, a document could be represented by a feature vector, and many machine-learning algorithms could be applied to perform clas- sification tasks. We employed SVM to perform the classifica- tion and libsvm [22] package is used in the system.
CONCLUSION
In this paper, we design and implement a movie-rating and review-summarization system in mobile environment. Senti- ment classification is applied to the movie reviews, and rat- ing information is based on sentiment-classification results. In feature-based summarization, product-feature identification plays an essential role, and we propose a novel approach based on LSA to identify related product features. Moreover, we use a statistical approach to identify opinion words. Product features and opinion words will be used as the basis for feature-based summarization.
In a system-performance-analysis experiment, the number of features plays an important role in SVM-model loading and prediction. We use frequency criterion to reduce the number of features, and the experiment shows that it takes less than 6 s to load the SVM model and classify the reviews. Furthermore, we propose an LSA-based filtering approach to reduce the size of the summary based on the user’s preferred aspect.