11-04-2012, 02:27 PM
Real-Time Speech-Driven Face Animation
10.1.1.93.1029.pdf (Size: 656.42 KB / Downloads: 44)
Introduction
Speech-driven face animation takes advantage of the correlation between speech and facial
coarticulation. It takes speech stream as input and outputs corresponding face animation
sequences. Therefore, speech-driven face animation only requires very low bandwidth
for “face-to-face” communications. The AVM is the main research issue of speechdriven
face animation.
Motion Units – The Visual Representation
MPEG-4 FA standard defines 68 MPEG-4 FAPs. Among them, two are high-level parameters,
which specify visemes and expressions. The others are low-level parameters
that describe the movements of sparse feature points defined on head, tongue, eyes,
mouth, and ears. MPEG-4 FAPs do not specify detail spatial information of facial deformation.
MUPs and MPEG-4 FAPs
It can be shown that the conversion between the MUPs and the low-level MPEG-4 FAPs
is linear. If the values of the MUPs are known, the facial deformation can be calculated
using eq. (1). Consequently, the movements of facial features in the lower face used by
MPEG-4 FAPs can be calculated because MUs cover the feature points in the lower face
defined by the MPEG-4 standard. It is then straightforward to calculate the values of
MPEG-4 FAPs.
Experimental Results
We videotape the front view of the same subject as the one in Section 2 while he is reading
a text corpus. The text corpus consists of one hundred sentences that are selected
from the text corpus of the DARPA TIMIT speech database. Both the audio and video are
digitized at 30 frame-per-second. The sampling rate of the audio is 44.1k Hz.
The iFACE System
We developed a face modeling and animation system – the iFACE system [8]. The system
provides functionalities for customizing a generic face model for an individual, textdriven
face animation, and off-line speech-driven face animation. Using the method presented
in this chapter, we developed the real-time speech-driven face animation function
for the iFACE system. First, a set of basic facial deformations is carefully and manually
designed for the face model of the iFACE system.
Conclusions
This chapter presents an approach for building real-time speech-driven face animation
system. We first learn MUs to characterize real facial deformations from a set of labeled
face deformation data. A facial deformation can be approximated by combining MUs
weighted by the corresponding MUPs.