05-10-2016, 12:19 PM
1457869930-facialAnimation1.DOCX (Size: 1.23 MB / Downloads: 3)
Outline
Iʼll start with an overview of facial animation and its history; along
the way Iʼll discuss some common approaches. After that Iʼll talk about some notable recent papers and finally offer a few thoughts about the future.
1 Defining the problem
2 Historical highlights
3 Some recent papers
4 Thoughts on the future
Defining the problem
Weʼll take facial animation to be the process of turning a characterʼs speech and emotional state into facial poses and motion. (This might extend to motion of the whole head.) Included here is the problem of modeling the form and articulation of an expressive head.
There are some related topics that we wonʼt consider today. Autonomous characters require behavioral models to determine what they might feel or say. Hand and body gestures are often used alongside facial animation. Faithful animation of hair and rendering of skin can greatly enhance the animation of a face.
Defining the problem
This is a tightly defined problem, but solving it is difficult.
We are intimately aware of how human faces should look, and sensitive to subtleties in the form and motion.
Lip and mouth shapes donʼt correspond to individual sounds, but are context-dependent. These are further affected by the emotions of the speaker and by the language being spoken.
Many different parts of the face and head work together to convey meaning.
Facial anatomy is both structurally and physically complex: there are many layers of different kinds of material (skin, fat, muscle, bones).
Defining the problem
Here are some sub-questions to consider.
How should the motion be produced? That is, how is the face model defined and what are its capabilities?
How do the constituent parts of speech correspond to facial motion?
What non-verbal expressions are produced during speech, and why?
How does a characterʼs emotional state affect his or her face?
Platt and Badler
In their 1981 SIGGRAPH paper, Platt and Badler describe how to construct expressions using a muscle-based facial model.
Animating Facial Expressions
Platt and Badler, SIGGRAPH 1981
This work used the Facial Action Coding System (FACS), a model from psychology, to determine which muscles to activate in the underlying model.
Sidebar: FACS
FACS describes the face in terms of “Action Units”. These may be combined to describe any facial expression.
For example, AU 23 is “Lip Tightener”; AU 19 is “Tongue Out”.
Some of these correspond directly to actions of facial muscles; others involve things like the movement of the tongue or air filling the cheeks.
Facial Action Coding System Ekman and Friesen, 1978
(The system has subsequently been revised several times.)
Sidebar: Phonemes
Phonemes are logical parts of words. For example, the first phoneme in the word “rip” is /r/; the first phoneme in “fun” is /f/, which is also the first is the first phoneme in “physics”.
Note that the same phoneme might have several slightly different sounds (or phones) due to context.
Phonemes are language-specific.
Cohen and Massaro
Cohen and Massaro (in 1990) also produced a lip-sync system using a parametric model and studied it in the context of speech perception. They later extended the model to include a tongue and to model coarticulation effects.
Synthesis of Visible Speech
Cohen and Massaro, 1990
Perception of synthesized audible and visible speech
Cohen and Massaro, 1990
Modeling coarticulation in synthetic visual speech
Cohen and Massaro, 1993
Sidebar: Coarticulation
Coarticulation refers to the way visual speech changes based on surrounding segments.
Cohen and Massaro (1993) give the examples of the articulation of the final consonant in “boot” and “beet” - backward coarticulation - and the way the lips round at the beginning of “stew” in anticipation of the “t”.