Mind Reading Machines: Automated Inference of Cognitive Mental States from Video

**project girl** · 10-11-2012, 04:33 PM

Mind Reading Machines: Automated Inference of Cognitive Mental States from Video

.pdf

mind reading document.pdf (Size: 1.79 MB / Downloads: 134)

Abstract

Mind reading encompasses our ability to attribute mental
states to others, and is essential for operating in a complex
social environment. The goal in building mind reading
machines is to enable computer technologies to understand
and react to people’s emotions and mental states. This
paper describes a system for the automated inference of
cognitive mental states from observed facial expressions
and head gestures in video. The system is based on a multilevel
dynamic Bayesian network classifier which models
cognitive mental states as a number of interacting facial
and head displays. Experimental results yield an average
recognition rate of 87.4% for 6 mental states groups: agreement,
concentrating, disagreement, interested, thinking and
unsure. Real time performance, unobtrusiveness and lack
of preprocessing make our system particularly suitable for
user-independent human computer interaction.

Introduction

People mind read or attribute mental states to others all the
time, effortlessly, and mostly subconsciously. Mind reading
allows us to make sense of other people’s behavior, predict
what they might do next, and how they might feel. While
subtle and somewhat elusive, the ability to mind read is
essential to the social functions we take for granted. A lack
of or impairment in mind reading abilities are thought to be
the primary inhibitor of emotion and social understanding
in people diagnosed with autism (e.g. Baron-Cohen et. al
[2]).
People employ a variety of nonverbal communication
cues to infer underlying mental states, including voice,
posture and the face. The human face in particular provides
one of the most powerful, versatile and natural means
of communicating a wide array of mental states. One
subset comprises cognitive mental states such as thinking,
deciding and confused, which involve both an affective and
intellectual component [4].

Extracting head action units

Natural human head motion typically ranges between 70-
90o of downward pitch, 55o of upward pitch, 70o of
yaw (turn), and 55o of roll (tilt), and usually occurs as
a combination of all three rotations [16]. The output
positions of the localized feature points are sufficiently
accurate to permit the use of efficient, image-based head
pose estimation. Expression invariant points such as the
nose tip, root, nostrils, inner and outer eye corners are used
to estimate the pose. Head yaw is given by the ratio of left
to right eye widths. A head roll is given by the orientation
angle of the two inner eye corners. The computation of both
head yaw and roll is invariant to scale variations that arise
from moving toward or away from the camera. Head pitch
is determined from the vertical displacement of the nose
tip normalized against the distance between the two eye
corners to account for scale variations. The system supports
up to 50o, 30o and 50o of yaw, roll and pitch respectively.
Pose estimates across consecutive frames are then used to
identify head action units. For example, a pitch of 20o
degrees at time t followed by 15o at time t + 1 indicates a
downward head action, which is AU54 in the FACS coding
[10].

Extracting facial action units

Facial actions are identified from component-based facial
features (e.g. mouth) comprised of motion, shape and
colour descriptors. Motion and shape-based analysis are
particularly suitable for a real time video system, in which
motion is inherent and places a strict upper bound on the
computational complexity of methods used in order to meet
time constraints. Color-based analysis is computationally
efficient, and is invariant to the scale or viewpoint of the
face, especially when combined with feature localization
(i.e. limited to regions already defined by feature point
tracking).

Head and facial display recognition

Facial and head actions are quantized and input into leftto-
right HMM classifiers to identify facial expressions and
head gestures. Each is modelled as a temporal sequence of
action units (e.g. a head nod is a series of alternating up and
down movement of the head). In contrast to static classifiers
which classify single frames into an emotion class, HMMs
model dynamic systems spatio-temporally, and deal with
the time warping problem. In addition, the convergence of
recognition computation may run in real time, a desirable
aspect in automated facial expression recognition systems
for human computer interaction.

Applications and conclusion

The principle contribution of this paper is a multi-level
DBN classifier for inferring cognitive mental states from
videos of facial expressions and head gestures in real time.
The strengths of the system include being fully automated,
user-independent, and supporting purposeful head displays
while de-coupling that from facial display recognition. We
reported promising results for 6 cognitive mental states on a
medium-sized posed dataset of labelled videos. Our current
research directions include:
1. testing the generalization power of the system by
evaluating a larger and more natural dataset
2. exploring the within-class and between-class variation
between the various mental state classes, perhaps by
utilizing cluster analysis and/or unsupervised classification

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Video Coding	seminar tips	1	769	31-08-2017, 12:02 PM Last Post: jaseela123
	MIND MANAGEMENT	seminar surveyer	1	7,132,251	30-08-2017, 04:59 PM Last Post: jaseela123
	MOTION SEGMENTATION AND MEANSHIFT ASSISTED CONTOUR REFINEMENT FOR AIRBORNE VIDEO	Computer Science Clay	0	12,611,560	25-08-2017, 09:32 PM Last Post: Computer Science Clay
	Digital Video Broadcasting Download Seminar Report and Paper Presentation	computer science crazy	0	19,380,005	25-08-2017, 09:32 PM Last Post: computer science crazy
	Pivot Vector Space Approach in Audio-Video Mixing	computer science crazy	0	11,011,943	25-08-2017, 09:32 PM Last Post: computer science crazy
	MPEG Video Compression	computer science crazy	0	21,302,197	25-08-2017, 09:32 PM Last Post: computer science crazy
	Pivot Vector Space Approach in Audio-Video Mixing	computer science crazy	0	10,226,711	25-08-2017, 09:32 PM Last Post: computer science crazy
	Cognitive architecture	seminar projects crazy	0	11,299,294	25-08-2017, 09:32 PM Last Post: seminar projects crazy
	Object tracking in video scenes	computer science crazy	0	19,799,334	25-08-2017, 09:32 PM Last Post: computer science crazy
	Genetic programming (GP) is an Automated Methodology	Computer Science Clay	0	13,830,539	25-08-2017, 09:32 PM Last Post: Computer Science Clay

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.