Emotional speech recognition: Resources, features, and methods pdf

**seminar projects maker** · 28-05-2014, 04:33 PM

Emotional speech recognition: Resources, features, and methods

.pdf

Emotional speech recognition:.pdf (Size: 561.03 KB / Downloads: 46)

Abstract

In this paper we overview emotional speech recognition having in mind three goals. The first goal is to provide an up-to-
date record of the available emotional speech data collections. The number of emotional states, the language, the number
of speakers, and the kind of speech are briefly addressed. The second goal is to present the most frequent acoustic features
used for emotional speech recognition and to assess how the emotion affects them. Typical features are the pitch, the
formants, the vocal tract cross-section areas, the mel-frequency cepstral coefficients, the Teager energy operator-based fea-
tures, the intensity of the speech signal, and the speech rate. The third goal is to review appropriate techniques in order to
classify speech into emotional states. We examine separately classification techniques that exploit timing information from
which that ignore it. Classification techniques based on hidden Markov models, artificial neural networks, linear discrim-
inant analysis, k-nearest neighbors, support vector machines are reviewed.
Ó 2006 Elsevier B.V. All rights reserved.

Introduction

Emotional speech recognition aims at automati-
cally identifying the emotional or physical state of
a human being from his or her voice. The emotional
and physical states of a speaker are known as
emotional aspects of speech and are included in
the so-called paralinguistic aspects. Although the
emotional state does not alter the linguistic content,
it is an important factor in human communication,
because it provides feedback information in many
applications as it is outlined next.

Data collections

A record of emotional speech data collections is
undoubtedly useful for researchers interested in
emotional speech analysis. An overview of 64 emo-
tional speech data collections is presented in Table
1. For each data collection additional information
is also described such as the speech language, the
number and the profession of the subjects, other
physiological signals possibly recorded simulta-
neously with speech, the data collection purpose
(emotional speech recognition, expressive synthe-
sis), the emotional states recorded, and the kind of
the emotions (natural, simulated, elicited).

Teager energy operator

Another useful feature for emotion recognition is
the number of harmonics due to the non-linear air
flow in the vocal tract that produces the speech
signal. In the emotional state of anger or for
stressed speech, the fast air flow causes vortices
located near the false vocal folds providing addi-
tional excitation signals other than the pitch (Teager
and Teager, 1990; Zhou et al., 2001). The additional
excitation signals are apparent in the spectrum as
harmonics and cross-harmonics. In the following,
a procedure to calculate the number of harmonics
in the speech signal is described.

Emotion classification techniques

The output of emotion classification techniques is
a prediction value (label) about the emotional state
of an utterance. An utterance un is a speech segment
corresponding to a word or a phrase. Let un,
n 2 {1, 2, . . . , N} be an utterance of the data collec-
tion. In order to evaluate the performance of a clas-
sification technique, the cross-validation method is
used. According to this method, the utterances of
the whole data collection are divided into the design
set Ds containing N Ds utterances and the test set Ts
comprised of N Ts utterances. The classifiers are
trained using the design set and the classification
error is estimated on the test set. The design and
the test set are chosen randomly. This procedure is
repeated for several times defined by the user and
the estimated classification error is the average
classification error over all repetitions (Efron and
Tibshirani, 1993).

Concluding remarks

In this paper, several topics have been addressed.
First, a list of data collections was provided includ-
ing all available information about the databases
such as the kinds of emotions, the language, etc.
Nevertheless, there are still some copyright prob-
lems since the material from radio or TV is held
under a limited agreement with broadcasters.
Furthermore, there is a need for adopting protocols
such as those in (Douglas-Cowie et al., 2003;
Scherer, 2003; Schro
̈ der, 2005) that address issues
related to data collection. Links with standardiza-
tion activities like MPEG-4 and MPEG-7 concern-
ing the emotion states and features should be
established. It is recommended the data to be
distributed by organizations (like LDC or ELRA),
and not by individual research organizations or pro-
ject initiatives, under a reasonable fee so that the
experiments reported using the specific data collec-
tions could be repeated. This is not the case with
the majority of the databases reviewed in this paper,
whose terms of distribution are rather unclear.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	HOW EMAIL WORKS pdf	project girl	1	3,067	20-09-2017, 11:39 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	Review: Context Aware Tools for Smart Home Development pdf	study tips	1	1,227	20-09-2017, 11:22 AM Last Post: jaseela123
	Getting Started with the MAXQ1103 Evaluation Kit and the CrossWorks Compiler pdf	project girl	1	969	15-09-2017, 03:11 PM Last Post: jaseela123
	Wireless Application Protocol (WAP) pdf	project girl	1	1,531	15-09-2017, 02:42 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	Wireless Automotive Communications pdf	seminar projects maker	1	637	14-09-2017, 01:27 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.