18-12-2012, 01:40 PM
FULLY AUTOMATIC RECOGNITION OF THE TEMPORAL PHASES OF FACIAL ACTION
1FULLY AUTOMATIC.doc (Size: 1.96 MB / Downloads: 51)
ABSTRACT
Past work on automatic analysis of facial expressions has focused mostly on detecting prototypic expressions of basic emotions like happiness and anger. The method proposed hereenables the detection of a much larger range of facial behaviorby recognizing facial muscle actions [action units (AUs)] thatcompound expressions. AUs are agnostic, leaving the inferenceabout conveyed intent to higher order decision making (e.g.,emotion recognition). The proposed fully automatic method not only allows the recognition of 22 AUs but also explicitly modelstheir temporal characteristics (i.e., sequences of temporal segments:neutral, onset, apex, and offset). To do so, it uses a facialpoint detector based on Gabor-feature-based boosted classifiersto automatically localize 20 facial fiducial points. These pointsare tracked through a sequence of images using a method calledparticle filtering with factorized likelihoods. To encode AUs andtheir temporal activation models based on the tracking data, it applies a combination of GentleBoost, support vector machines,and hidden Markov models. We attain an average AU recognitionrate of 95.3% when tested on a benchmark set of deliberatelydisplayed facial expressions and 72% when tested on spontaneousexpressions.
INTRODUCTION
FACIAL EXPRESSIONS synchronize the dialogue by means of brow raising and nodding, clarify the content and intent of what is said by means of lip reading and emblems like a wink, signal comprehension, or disagreement, and convey messages about cognitive, psychological, and affective states . Therefore, attaining machine understanding of facial behavior would be highly beneficial for fields as diverse as computing technology, medicine, and security in applications like ambient interfaces, empathetic tutoring, interactive gaming, research on pain and depression, health support appliances, of stress and fatigue, and deception detection. Because of this practical importance and the theoretical interest of cognitive and medical scientists machine analysis of facial expressions attracted the interest of many researchers in computer vision .
AUTOMATIC ANALYSIS OF FACIAL EXPRESSIONS
Two main streams in the current research on automatic analysis of facial expressions consider facial affect (emotion) detection and facial muscle action detection. These streams stem directly from the two major approaches to facial expression measurement in psychological research : message and sign judgment. It leads to infer what underlies a displayed facial expression, such as affect or personality, while the aim of the latter is to describe the "surface" of the shown behavior, such as facial movement or facial component shape.
Thus, a frown can be judged as "anger" in a message-judgment approach and as a facial movement that lowers and pulls the eyebrows closer together in a sign-judgment approach. Most facial expression analyzers developed so far adhere to the message judgment stream and attempt to recognize a small set of prototypic emotional facial expressions such as the six basic emotions proposed by Ekman.
In sign judgment approaches, a widely used method for manual labeling of facial actions is the Facial Action Coding System (FACS) . FACS associates facial expression changes with actions of the muscles that produce them. It defines 9 different action units (AUs) in the upper face, 18 in the lower face, and 5 AUs that cannot be classified as belonging to either the upper or the lower face. Additionally, it defines so-called action descriptors, 11 for head position, 9 for eye position, and 14 additional descriptors for miscellaneous actions (for examples, see Fig. 1).
AU’s IN POSED FACE IMAGES
The focus of the research efforts in the field was first on automatic recognition of AUs in either static face images or face image sequences picturing facial expressions produced on command. One of the main criticisms that these works received from both cognitive and computer scientists is that the methods are not applicable in real-life situations, where subtle changes in facial expression typify the displayed facial behavior rather than the exaggerated AU activations typical of deliberately displayed facial expressions.
Automatic recognition of facial expression configuration has been the main focus of the research efforts in the field. Both the configuration and the dynamics of facial expressions are important for the interpretation of human facial behavior. Facial expression temporal dynamics are essential for the categorization of complex psychological states like various types of pain and mood . They are also the key parameter in the differentiation between posed and spontaneous facial expressions. Some of the past work in the field has used aspects of temporal dynamics of facial expression such as the speed of a facial point displacement or the persistence of facial parameters over time. However, this was mainly done either in order to increase the performance of facial expression analyzers or in order to report on the intensity of (a component of) the shown facial expression but not in order to analyze explicitly the properties of facial actions' temporal dynamics. However, that this work does not report on the explicit analysis of temporal segments of AUs (e.g., the duration and the speed of the onset and offset of the actions).These leads to the studies on automatic segmentation of AU activation into temporal segments (neutral, onset, apex, and offset) in frontal and profile-view face videos. Hence, the focus of the research in the field started to shift toward automatic AU recognition in spontaneous facial expressions
FACIAL POINT DETECTION
The first step in any facial information extraction process is face detection, i.e., the identification of all regions in the scene that contain a human face. The second step in facial expression analysis is to extract geometric features (facial points and shapes of facial components) and/or appearance features (descriptions of the texture of the face such as wrinkles and furrows).
FACE DETECTION
The most commonly employed face detector in automatic facial expression analysis is the real-time Viola-Jones face detector. The Viola-Jones face detector consists of a cascade of classifiers. Each classifier employs integral image filters and can be computed very fast at any location and scale. This is essential to the speed of the detector. For each stage in the cascade, a subset of features is chosen using a feature selection procedure. The C++ code of the face detector runs at about 500 Hz on a 3.2-GHz Pentium 4.
CHARACTERISTIC FACIAL POINT DETECTION
Methods for facial feature point detection can be classified as either texture-based methods (modeling local texture around a given facial point) or shape-based methods (which regard all facial points as a shape that is learned from a set of labeled faces). A typical texture-based methods uses log-Gabor filters. And also typically texture and shape-based methods are employed by AdaBoost to determine facial feature point for each pixel in an input image and used a shape model as a filter to select the most likely position of feature points.