08-06-2013, 01:01 PM
Boosting for Transfer Learning
ABSTRACT
Traditional machine learning makes a basic assumption: the training and test data should be under the same distribution. However, in many cases, this identical distribution assumption does not hold. The assumption might be violated when a task from one new domain comes, while there are only labeled data from a similar old domain. Labeling the new data can be costly and it would also be a waste to throw away all the old data. In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms. (TrAdaBoost allows users to utilize a small amount of newly labeled data to leverage the old data to construct a high-quality classification model for the new data. We show that this method can allow us to learn an accurate model using only a tiny amount of new data and a large amount of old data, even when the new data are not sufficient to train a model alone. We show that TrAdaBoost allows knowledge to be effectively transferred from the old data to the new. The effectiveness of our algorithm is analyzed theoretically and empirically to show that our iterative algorithm can converge well to an accurate model. A fundamental assumption in classification learning is that the data distributions of training and test sets should be identical. When the assumption does not hold, traditional classification methods might perform worse. However, in practice, this assumption may not always hold. For example, in Web mining, the Web data used in training a Web-page classification model can be easily out-dated when applied to the Web sometime later, because the topics on the web change frequently. Often, new data are expensive to label and thus their quantities are limited due to cost issues. How to accurately classify the new test data by making the maximum use of the old data becomes a critical problem. we proposed a novel framework TrAdaBoost for transferring knowledge from one distribution to another by boosting a basic learner. The basic idea is to select the most useful diff-distribution instances as additional training data for predicting the labels of same-distribution techniques. The theoretical analysis shows that TrAdaBoost first obeys the same-distribution training data, and then chooses the most helpful diff-distribution training instances as additional training data. Moreover, in our experiments, TrAdaBoost also demonstrates better transfer ability than traditional learning techniques. In almost all situations, TrAdaBoost gives better performance than the baseline methods.