10-08-2012, 09:47 AM
Face Recognition with Image Sets Using Manifold Density Divergence
1Face Recognition.pdf (Size: 403.76 KB / Downloads: 24)
Abstract
In many automatic face recognition applications, a set
of a person’s face images is available rather than a single
image. In this paper, we describe a novel method for face
recognition using image sets. We propose a flexible, semiparametric
model for learning probability densities confined
to highly non-linear but intrinsically low-dimensional
manifolds. The model leads to a statistical formulation
of the recognition problem in terms of minimizing the divergence
between densities estimated on these manifolds.
The proposed method is evaluated on a large data set, acquired
in realistic imaging conditions with severe illumination
variation. Our algorithm is shown to match the best
and outperform other state-of-the-art algorithms in the literature,
achieving 94% recognition rate on average.
Introduction
Automatic face recognition (AFR) has long been one of
the most active research areas in computer vision. In the last
two decades a vast number of different AFR algorithms has
been developed – Bayesian eigenfaces [20], Fisherfaces [4],
elastic bunch graph matching [18], and the 3D morphable
model [8, 23], to name just a few popular ones. These methods
have achieved very good accuracy on a small number of
controlled test sets.
In sharp contrast is the real-world performance of AFR,
which has been, to say the least, disappointing. Even in
very controlled imaging conditions, such as those used for
passport photographs, the error rate has been reported to be
as high as 10% [10], while in less controlled environments
the performance degrades even further [9]. We believe that
the main reason for this apparent discrepancy between the
results reported in the literature and those observed in the
real world is that the assumptions that most AFR methods
rest upon are hard to satisfy in practice (see Section 2).
Previous Work
Good general reviews of recent AFR literature can be
found in [2, 13, 29]. In this section, we focus on AFR literature
that deals specifically with recognition from image
sets, and with invariance to pose and illumination.
Recognition across illumination Illumination invariance
is perhaps the most significant challenge for AFR: image
differences due to changing illumination may be larger than
differences between individuals [1]. Most of the work on
recognition under varying illumination has been on recognition
from single images. Two of the most influential
approaches are the illumination cones of Belhumeur et
al. [5, 15] and the 3D morphable model of Blanz and Vetter
[7]. In [5] the authors showed that the set of images
of a convex, Lambertian object, illuminated by an arbitrary
number of point light sources at infinity, forms a convex
polyhedral cone in the image space with dimension equal to
the number of distinct surface normals. In [15], Georghiades
et al. successfully used this result for AFR by reilluminating
images of frontal faces. In the 3D morphable model
method, parameters of a complex generative model which
includes the pose, shape and albedo of a face (assumed to
be a Lambertian surface) are recovered in an analysis-bysynthesis
fashion.
Modelling Face Manifold Densities
Under the standard representation of an image as a
raster-ordered pixel array, images of a given size can be
viewed as points in a Euclidean image space. The dimensionality,
D, of this space is equal to the number of pixels.
UsuallyD is high enough to cause problems associated with
the curse of dimensionality in learning and estimation algorithms.
However, surfaces of faces are mostly smooth and
have regular texture, making their appearance quite constrained.
As a result, it can be expected that face images
are confined to a face space, a manifold of lower dimension
d D embedded in the image space [6]. Below, we formalize
this notion and propose an algorithm for comparing
the estimated densities on the manifolds.
Empirical Evaluation
Methods in this paper were evaluated on a database
with 99 individuals of varying age (see Table 1) and race,
and equally represented genders. For each person in the
database we collected 7 video sequences of the person in
arbitrary motion (significant translation, yaw and pitch, and
negligible roll), see Figure 6. Each sequence was recorded
in a different illumination setting, at 10 frames per second
and 320 × 240 pixel resolution.
The discussion above focused on recognition using
fixed-scale face images. A practical AFR system must obtain
such images from the available video frames. Before
we report the experimental results in Section 4.2, we describe
our fully automatic system for extracting and normalizing
face image sets from unconstrained video of the
subjects. A diagram of the system is shown in Figure 7.
Summary and Conclusions
In this paper, we have introduced a new statistical approach
to face recognition with image sets. Our main contribution
is the formulation of a flexible mixture model that
is able to accurately capture the modes of face appearance
under broad variation in imaging conditions. The basis
of our approach is the semi-parametric estimate of probability
densities confined to intrinsically low-dimensional,
but highly nonlinear face manifolds embedded in the highdimensional
image space. The proposed recognition algorithm
is based on a stochastic approximation of Kullback-
Leibler divergence between the estimated densities. Empirical
evaluation on a database with 100 subjects has shown
that the proposed method, integrated into a practical automatic
face recognition system, is successful in recognition
across illumination and pose. Its performance was shown
to match the best performing state-of-the-art method in the
literature and exceed others.