17-12-2012, 05:42 PM
Mercer’s Theorem, Feature Maps, and Smoothing
1Mercer’s Theorem,.pdf (Size: 251.73 KB / Downloads: 17)
Introduction
Kernel-based methods have become increasingly popular and important in machine
learning. The central idea behind the so-called “kernel trick” is that a
closed form Mercer kernel allows one to efficiently solve a variety of non-linear
optimization problems that arise in regression, classification, inverse problems,
and the like. It is well known in the machine learning community that kernels are
associated with “feature maps” and a kernel based procedure may be interpreted
as mapping the data from the original input space into a potentially higher dimensional
“feature space” where linear methods may then be used. One finds
many accounts of this idea where the input space X is mapped by a feature map
: X ! H (where H is a Hilbert space) so that for any two points x, y 2 X, we
have K(x, y) = h(x), (y)iH.
Yet, while much has been written about kernels and many different kinds
of kernels have been discussed in the literature, much less has been explicitly
written about their associated feature maps. In general, we do not have a clear
and concrete understanding of what exactly these feature maps are. Our goal
in this paper is to take steps toward a better understanding of feature maps
by explicitly computing them for a number of popular kernels for a variety of
domains. By doing so, we hope to clarify the precise nature of feature maps in
very concrete terms so that machine learning researchers may have a better feel
for them.
Example on the Unit Ball
Except for the homogeneous polynomial kernel K(x, t) = hx, tid, the computation
of the spectrum of LK on the unit ball Bn is much more difficult analytically
than that on Sn−1. For K(x, t) = (1 + hx, ti)d and small values of d, it is still
possible, however, to obtain explicit answers.