08-06-2013, 01:00 PM
A Spectral Regularization Framework for Multi-Task Structure Learning
ABSTRACT
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularized. We analyze concrete examples of the framework, which are equivalent to regularization with Lp matrix norms. Experiments on two real data sets indicate that the algorithm scales well with the number of tasks and improves on state of the art statistical performance we extend the formulation of, where the structure shared by the tasks is described by a positive definite matrix. We propose a framework in which the task parameters and the structure matrix are jointly computed by minimizing a regularization function. This function has the following appealing property. When the structure matrix is fixed, the function decomposes across the tasks, which can hence be learned independently with standard methods such as SVMs. When the task parameters are fixed, the optimal structure matrix is a spectral function of the covariance of the tasks and can often be explicitly computed. As we shall see, spectral functions are of particular interest in this context because they lead to an efficient alternating minimization algorithm. We have presented a spectral regularization framework for learning the structure shared by many supervised tasks. This structure is summarized by a positive definite matrix which is a spectral function of the tasks' covariance matrix. The framework is appealing both theoretically and practically. Theoretically, it brings to bear the rich class of spectral functions which is well-studied in matrix analysis. Practically, we have argued via the concrete example of negative power spectral functions, that the tasks' parameters and the structure matrix can be efficiently computed using an alternating minimization algorithm, improving upon state of the art statistical performance on two real data sets. A natural question is to which extent the framework can be generalized to allow for more complex task sharing mechanisms, in which the structure parameters depend on higher order statistical properties of the tasks