04-12-2012, 12:30 PM
Image Ratio Features for Facial Expression Recognition Application
image ratio.pdf (Size: 788.09 KB / Downloads: 37)
Abstract
Video-based facial expression recognition is a challenging
problem in computer vision and human–computer interaction.
To target this problem, texture features have been
extracted and widely used, because they can capture image intensity
changes raised by skin deformation. However, existing
texture features encounter problems with albedo and lighting
variations. To solve both problems, we propose a new texture
feature called image ratio features. Compared with previously
proposed texture features, e.g., high gradient component features,
image ratio features are more robust to albedo and lighting variations.
In addition, to further improve facial expression recognition
accuracy based on image ratio features, we combine image
ratio features with facial animation parameters (FAPs), which
describe the geometric motions of facial feature points. The performance
evaluation is based on the Carnegie Mellon University
Cohn–Kanade database, our own database, and the Japanese
Female Facial Expression database. Experimental results show
that the proposed image ratio feature is more robust to albedo and
lighting variations, and the combination of image ratio features
and FAPs outperforms each feature alone.
INTRODUCTION
IN THE PAST decade, much progress has been made on face
recognition [1]–[4]. However, computational facial expression
analysis is still a challenging and attractive research topic
in computer vision and intelligent human–computer interaction
(HCI). Facial-expression-related research was launched by
Ekman and Friesen [5] in the 1970s. In Ekman’s early work,
face actions were described in terms of the Facial Action
Coding System (FACS) [7], and facial expressions were semantically
coded with respect to seven basic but “universal”
dimensions, i.e., neutral, anger, disgust, fear, joy, sadness,
and surprise. However, in practice, automatic facial expression
recognition by computer did not really begin until 1990s.
Most approaches [9]–[17] are based on Ekman’s theory for
developing intelligent HCI at the current stage, although some
researchers proposed approaches based on other emotion models,
e.g., the valence/arousal dimensional model [8].
SYSTEM OVERVIEW
Fig. 1 shows an overview of our system. Our comprehensive
facial expression parameters consist of two subsets: 1) FAPs
and 2) Skin Deformation Parameters (SDPs).
Both the FAPs inMPEG-4 [30] and the AUs in FACS [7] can
be used to describe the facial action. We choose FAPs, instead
of AUs, in FACS, because 1) FAPs are standard in ISO/IEC
developed by MPEG (Moving Picture Experts Group); and
2) FAPs concisely represent the geometric evolution of the
facial expression, including the asymmetric one using 66 lowlevel
parameters. They are adequate for describing the facial
feature motions. In fact, not all of these 66 low-level FAPs
are necessary for facial expression recognition. In this work,
a subset of the low-level FAPs, which are relevant to expression
recognition, are extracted, as shown in Table I. Twenty-seven
key points are used to compute these parameters (Fig. 2). The
intensity of FAPs is described by FAP units (FAPUs), which are
also defined in MPEG-4 standard.
CONCLUSION
In this paper, we have proposed image ratio features for facial
expression recognition. Image ratio features effectively capture
image intensity changes due to skin deformations. Compared
with the previously proposed high gradient component features,
image ratio features are more robust to albedo and lighting
variations. Thorough experimental results have demonstrated
that image ratio features significantly improve facial expression
recognition performance when there are large lighting and
albedo variations. In addition, we have developed an expression
recognition system that combines image ratio features with
FAPs. We have shown that the combination of image ratio
features and FAPs outperforms each feature alone, and the
combined system is effective at handling both symmetric and
asymmetric facial expressions.