25-08-2017, 09:32 PM
REAL TIME FACIAL EXPRESSION RECOGNITION
USING A NOVEL METHOD
REAL TIME FACIAL EXPRESSION.pdf (Size: 613.44 KB / Downloads: 54)
ABSTRACT
This paper discusses a novel method for Facial Expression Recognition System which performs facial
expression analysis in a near real time from a live web cam feed. Primary objectives were to get results in a
near real time with light invariant, person independent and pose invariant way. The system is composed of
two different entities trainer and evaluator. Each frame of video feed is passed through a series of steps
including haar classifiers, skin detection, feature extraction, feature points tracking, creating a learned
Support Vector Machine model to classify emotions to achieve a tradeoff between accuracy and result rate.
A processing time of 100-120 ms per 10 frames was achieved with accuracy of around 60%. We measure
our accuracy in terms of variety of interaction and classification scenarios. We conclude by discussing
relevance of our work to human computer interaction and exploring further measures that can be taken.
KEYWORDS
Haar Classifier, SVM, Shi Tomasi Corner Detection, facial expression analysis, affective user interface
INTRODUCTION
Major component of human communication are facial expressions which constitute around 55
percent of total communicated message [1]. We use facial expressions not only to express our
emotions, but also to provide important communicative cues during social interaction, such as our
level of interest, our desire to take a speaking turn and continuous feedback signaling
understanding of the information conveyed.
There has been a global rush for facial expression recognition over the last few years. A number
of methods have been proposed but no single method which is both efficient in terms of memory
and time complexity has yet been found.
In 1978, Paul Ekman and Wallace V. Friesen published the Facial Action Coding System
(FACS)[14], which, 30 years later, is still the most widely used method available. Through
observational and electromyography study of facial behavior, they determined how the
contraction of each facial muscle, both singly and in unison with other muscles, changes the
appearance of the face. It talks about six basic emotions (anger, disgust, fear, joy, sorrow,
surprise). FACS codes expressions using a combination of 44 facial movements called as action
units (AUs).While lot of work on FACS has been done and also FACS being an efficient,
objective method to describe facial expressions, it is not without its drawbacks. Coding a
subject’s video is a time- and labor-intensive process that must be performed frame by frame. A
trained, certified FACS coder takes on average 2 hours to code 2 minutes of video. In situations
where real-time feedback is desired and necessary, manual FACS coding is not a viable option.
The International Journal of Multimedia & Its Applications (IJMA) Vol.4,
This gives rise to our problem statement i.e. to be able to infer emotions in real time using a live
video feed with the following challenges
• To be able to provide results in real time
• To be able to make a person independent model
• To be efficient in classifying the emotions into one of the following anger, smile,
surprise, neutral.
• Light Invariant Application
• Pose Invariant
Pantic and Rothkrantz[2] suggested three basic problems for expression analysis, the three steps
that can be identified as three separate problems in the whole process i.e. face detection in a
facial image or image sequence ,facial expression data extra
classification. Face detection methods that have been around mostly assumes frontal face view in
image or image sequence being analyzed. Viola & Jones [3] provides competitive face detection
in real time, uses the adaboost algor
various scales for rapid detections. Essa & Pentland[4] perform spatial and temporal filtering
together with thresholding to identify the motion blobs from image sequence. To detect face then
Eigen face method [5] is used .The PersonSpotter system [6] tracks the bounding box of the head
in video using spatio-temporal filtering and disparity in pixels ,thereby selecting the ROI’s and
then passing through skin detector and convex region detector to
Second step involves face data extraction, Littleworth et al. [7] used a bank of 40 gabor filters at
different scale and orientations to apply convolution on the image and then complex value
response was recorded as in [8]. Essa & Pentland
facial feature points using a set of sample images via FFT and local energy computation. Cohn et
al. [9] firstly normalize the feature points in first frame of image sequence and then use optical
flow to track them .The displacement vectors for each landmark between the initial and peak
frame represent the extracted information. Other methods include AAM, T.F.Cootes [10] et al.
uses a statistical model of shape and grey
generalize to almost any valid example .Match is obtained in a few iterations by changing
parameters in accordance to residual of trained and current image.
Third step involves classification of emotions based on a training model constructed o
properties related to feature points for this purpose we have created our learning model using
support vector machines[11]
IMPLEMENTATION
The overall system has been developed using C/C++ with support of OpenCV, libSVM[11]
libraries. The overall process can be shown as follows
Figure 1. The overall process for facial expression recognition
Frame
Extraction Detection
ing extraction and facial expression
algorithm to exhaustively pass a search window over the image at
en check for face.
Detection of Haar Like Features (Viola & Jones [3])
Viola and Jones in 2001 published their breakthrough work which allowed appearance based
methods to run in real time, while keeping the same or improved accuracy.
Rectangle Features
The sums of the pixels which lie within the white rectangles are subtracted from the sum of pixels
in the grey rectangles.
Rectangle Features can be computed ver
called integral image. The value of the integral image at point (x,y) is the sum of all the
pixels above and to the left.