04-11-2016, 10:20 AM
1464294626-Paper20115.pdf (Size: 336.59 KB / Downloads: 10)
Abstract—With periocular biometrics gaining attention recently,
the goal of this paper is to investigate the effectiveness of
local appearance features extracted from the periocular region
images for soft biometric classification. We extract gender and
ethnicity information from the periocular region images using
grayscale pixel intensities and periocular texture computed by
Local Binary Patterns as our features and a SVM classifier.
Results are presented on the visible spectrum periocular images
obtained from the FRGC face dataset. For 4232 periocular
images of 404 subjects, we obtain a baseline gender and
ethnicity classification accuracy of 93% and 91%, respectively,
using 5-fold cross validation. Furthermore, we show that fusion
of the soft biometric information obtained from our classification
approach with the texture based periocular recognition
approach results in an overall performance improvement.
I. INTRODUCTION
The periocular biometric is gaining attention lately as a
means of improving robustness of face or iris biometric
modalities [15], [17]. Periocular region is the area surrounding
the eye, and is generally considered to be one
of the most discriminative regions of the face. It has been
shown that periocular region can be independently used for
recognition and can aid face or iris recognition [18], when
the inherent biometric content in the source images is poor
(for example, due to the poor quality of an image). It has
also been suggested that periocular features can potentially
be used for soft biometric classification. In this paper, we
explore the utility of appearance based periocular features
for soft biometric classification, and using the soft biometric
information for improving the recognition performance of
appearance based periocular features.
Soft biometric information can be used to classify an
individual in broad categories but is not sufficiently discriminative
to perform recognition tasks [9]. For example,the
knowledge about gender, ethnicity, age, or other traits such
as height, weight, dimensions of limbs, skin color, hair
color, etc., can be termed as soft biometrics. While such
information is too broad to identify an individual, it can be
valuable to narrow down the search space, or actually help
improve results while performing identification. Although
there is a wide variety of soft biometric traits that can be
gathered from an individual, only a limited number of traits
can be gathered from a given sensor. For example, from a
camera set up to acquire face images for facial recognition, soft biometric traits such as gender, ethnicity, or age can be
determined with a lot more accuracy than height or weight.
Due to the popularity of the face biometric, facial images
have been used extensively to obtain gender and ethnicity
information. Table I lists some of the key approaches in this
area. All of the approaches listed in the table follow the
strategy of training classifiers for a given set of classes in
order to perform classification. It can be seen that a majority
of the approaches rely on the appearance information present
in face images. Typical feature representation used for this
purpose include grayscale pixel intensities (used directly or
represented in terms of PCA eigenvectors), Local Binary
Patterns (LBP), Haar wavelets, and Gabor wavelets among
others. The classifiers of choice are Adaboost (along with
various variants of boosting), SVM, Neural Networks, and
LDA among others. While each of the classifiers has its
own advantages and limitations, SVM seems to be the most
popular choice for gender and ethnicity classification due to
its relatively high accuracy and generalizing ability.
In this work, we focus on gender and ethnicity classi-
fication of individuals using periocular images instead of
full face images. The aim is to explore whether periocular
images carry enough information to reliably obtain similar
soft biometric information to that obtained from face images.
Additionally, with the increasing interest in periocular
biometrics, this work also aims to use the soft biometric
classification results to improve previously reported periocular
based recognition results. Figure 1 shows some examples
of the periocular images belonging to different classes, i.e.,
male and female genders and different ethnic groups considered
in this study. We exploit the appearance cues present
in the periocular image such as texture and grayscale pixel
intensities, with the inherent assumption that the relative
success of these features in discriminating various classes
in face images would also translate well to the periocular
images.
APPROACH
Our approach is divided into three steps: collection and
preprocessing of the periocular data, feature extraction, and
classification.
A. Periocular Data
The visible spectrum periocular images are obtained from
high resolution frontal face images belonging to the FRGC
dataset [16] that are captured under different conditions. The
high resolution still face images (≈ 1200×1400, 72 dpi)
allow for the periocular texture to be imaged in significant detail. Also, the ground truth eye centers are provided,
making it easier to crop out the periocular images and scale
them to the required size. In this work, we scale the cropped
periocular images to the uniform size of 251 × 251 pixels.
The distance of the subject from the camera is assumed to
be constant for controlled settings, hence the effects of scale
change are assumed to be negligible.
The FRGC dataset also provides the ground truth labels for
gender and ethnicities of the subjects. The gender distribution
of the FRGC subjects is 57% male and 43% female. There
are subjects of various ethnicities in the FRGC dataset,
but they are primarily divided into three classes: Caucasian
(68%), Asian (22%), and Other (10%). In this paper, we
consider only two ethnic classes: Asian and Non-Asian
because the Other class is thinly represented in the dataset.
This bias in the population leads to very few training samples
in that class, which results in poor classification performance.
This aspect is described in detail in Section III.
For preprocessing, the images are converted to grayscale
and histogram equalization is performed. To eliminate the
effect of texture and color in the iris and surrounding sclera
area, an elliptical mask of neutral color is placed over the
center of the periocular region image. The dimensions of the
ellipse are predefined based on the dimensions of the input
periocular image rather than the dimensions of the subject’s
eye. The underlying assumption is that the change in size of
the eye is mostly on account of its varying amount of opening
and not so much due to changes in scale. This, coupled with
the fact that the images are aligned and scaled to a fixed size,
allows the placing of a fixed size ellipse on the eye such that
a significant amount of periocular skin is still visible.
B. Features Extraction
The features we use for this work are grayscale pixel
intensities and periocular texture calculated by Local Binary
Patterns. Grayscale pixel intensity features are computed by
taking the preprocessed m×n grayscale periocular image and
reshaping it to a vector of size 1×(m⋅n) and scaling it to the
range [0,1]. However, using the 251×251 image is a large
feature vector, so the images were downsampled to 50×50
pixels prior to reshaping. Texture features are computed with
Local Binary Patterns. LBPs measure commonly observed
intensity patterns in a local pixel neighborhood, such as
spots, line ends, edges, corners, and other distinct texture patterns.
For this work, LBP features were computed on separate
blocks of the preprocessed image and then concatenated and
converted to vector form, similar to LBP feature extraction
in [17].
C. Classification
We use Support Vector Machines (SVM) for gender and
ethnicity classification. The basic training principal of SVM’s
is to find the optimal hyperplane that separates the classes
with a maximum margin [3]. Given a set of M training
samples xi (i.e., LBP or grayscale pixel representation for
an image) and a set of M labels yi (i.e., Male/Female
or Asian/Non-Asian), where xi ∈ RN and yi ∈ {−1,1}, a SVM classifier finds the optimal hyperplane that correctly
classifies the largest portion of the training samples while
maximizing the distance of either class to the hyperplane (the
margin). A test sample x is classified using the discriminating
hyperplane, defined by:
f(x) = w⋅ φ(x) +b, where w =
M
∑
i=1
αiyiφ(xi) (1)
where w is the weight vector, b is a bias term, φ(x) is a transform
related to the chosen kernel function by the equation
k(x, xi) = φ(x)⋅ φ(xi), and the sign of f(x) determines the
class of x. Determining the optimal hyperplane is equivalent
to finding all αi > 0. Any vector xi from the training set
corresponding to a nonzero αi is a support vector of the
optimal hyperplane.
For a linear SVM, the kernel function is simply the dot
product in RN, while the kernel function in a nonlinear SVM
projects the training samples to a space of higher dimensionality.
Once the points are projected, the nonlinear SVM finds
the optimal hyperplane in the new, higher dimensional space.
Nonlinear kernels allow for a better separation of data which
is not linearly separable. Polynomial and radial basis function
(RBF) kernels, as shown below, are examples of kernels used
in nonlinear SVMs.
∙ Polynomial: k(x, xi) = ((x ⋅ xi) +1)p
∙ RBF: k(x, xi) = e−γ∥x−xi∥2
, γ > 0
For our approach we use a linear SVM and a nonlinear SVM
with a RBF kernel using the LIBSVM software [4]. RBF
parameters are chosen for each experiment after running a
grid search as mentioned in [5] for the best values of γ and
C, where C > 0 is the penalty parameter for the error term.
Overall, the results for the nonlinear SVM are higher than
those for the linear SVM, so only the nonlinear results are
reported.
III. EXPERIMENTAL RESULTS
In this section we discuss the soft biometric classification
results, and their fusion with the local appearance based
periocular recognition techniques to improve overall performance.
First, we describe the experimental setup.
A. Experimental Setup
We experiment on the left and right periocular images and
the corresponding face images from the FRGC face dataset.
We use the same feature extraction and classification technique
for both face and periocular images. There are a total
of 404 subjects in our experiments. Each subject has multiple
face images captured under different conditions giving us a
total of 2116 face, left, and right periocular images. These
2116 images for each of the face, left and right periocular
region are divided into 5 sets for our experiments: 1 gallery
set (G), and 4 probe sets (P1, P2, P3, P4). The gallery
images are captured under controlled lighting, with neutral
expression, and in the same session. The P1 set of images are
captured under controlled lighting, with neutral expression,
but in a different session. The P2 set consists of images captured under controlled lighting, alternate expression, and
in the same session. The set P3 is similar to P2 except the
images are captured in a different session. The P4 set has
images captured under uncontrolled lighting. There are two
images per subject in set G and generally one image per
subject in sets P1 through P4, but each probe set may not
have images of all subjects. In total there are 808 images in
G, and 356, 402, 353, and 197 images in P1, P2, P3, and
P4, respectively.
For evaluating soft biometric classification performance,
we perform 7 experiments using various combinations of
training and test sets for each periocular region and the face.
The training and test set configurations for these experiments
are labeled as (ALL,ALL), (G,ALL), (G,G), (G,P1), (G,P2),
(G,P3), and (G,P4). For each of these experiments we use 5-
fold cross validation for reporting the classification accuracy.
Testing is performed on subjects not used in training. A 5-
fold cross validation scheme means images belonging to 80%
of the subjects are used for training at a given time, while
the other 20% of subjects are used for testing. The cross
validation scheme is set up such that each subject is tested
exactly once. The experiment (ALL,ALL) can be considered
as the baseline, where we use all 5 image sets to draw the
training samples from and test on all the available test images
(images of the subjects not used in training). The experiment
(G, ALL) means training is done only on images drawn from
the set G, while testing is performed on all the available test
images. This means training at a given instant is done on the
gallery images of 80% of the subjects, and testing is done on
all the images belonging to the rest of the subjects. Similarly,
the experiments (G,G), (G,P1), (G,P2), (G,P3), and (G,P4)
perform 5-fold cross validation on individual test sets while
using the set G for drawing training samples. While drawing
training samples, we try to keep the gender or ethnicity ratio
the same as that observed in the entire population of our test
set.