21-09-2010, 12:22 PM
AN ANALOG INTEGRATED-CIRCUIT VOCAL TRACT
PRESENTED BY:
NIEL V JOSEPH
S7 AEI
ROLL NO-46
COLLEGE OF ENGINEERING, TRIVANDRUM
2007-11 BATCH
CONTENTS
Introduction
Human vocal tract
Concept of speech locked loop
Circuit model of vocal tract
Two -port ∠-section
Linear and non linear resistor modeling
Driving the vocal tract
conclusion
INTRODUCTION
First experimental integrated circuit vocal tract
16 stage cascade of two port âˆ-elements
Analysis by synthesis
Speech locked loop
Human vocal tract
Function is filtering of sound
Consist of laryngeal cavity, pharynx, nasal cavity and oral cavity
Length in adult males is 16.9 cm and in females 14.1cm
Larynx produces sound in mammals
Lungs act as power supply
controlled variations in the area of vocal tract produces speech
Two sources of excitation are
Periodic source at the glottis
Turbulent noise source at some point along the tube
Vocal fold vibrations produces interruption of airflow
CONCEPT OF SPEECH LOCKED LOOP
CIRCUIT MODEL OF VOCAL TRACT
Vocal tract is assumed as non-uniform acoustic tube
Terminated by the vocal chord at one end and lip/nose at other end
Cross sectional area is varied by changing impedances at different points
Propagation of wave is approximately one dimensional
The wave equation for one dimensional sound propagation in a uniform tube of circular cross section is
Acoustic wave propagation in a tube is analogous to plane wave propagation along an electrical transmission line
Equation can be modified as
Transmission Line (TL) model
TL comprises of cascade of two-port elements
Current source model
Variable impedance model
Fluid volume velocity is mapped to current
Fluid pressure is mapped to voltage
TWO-PORT pi-SECTION
LINEAR AND NON LINEAR RESISTOR MODELING
Implemented with MOS transistor
Glottal constriction resistance is a series combination of linear and non linear resistors
For linear characteristics I ∞V
For non linear characteristics I ∞√V
DRIVING THE VOCAL TRACT
It can produce all speech sounds
We should be given area function, the glottal excitation source, the turbulent noise source
Area function has large number of degrees of freedom
To reduce the dimensionality we use Maeda articulatory model
The Maeda articulatory model describes the vocal tract profile using seven components
Jaw height
Tongue body position
Tongue body shape
Tongue tip
Lip height
Lip protrusion
Larynx height
For many speech synthesis applications 5-7kHz is sufficient
REFERENCES
B. Raj, L. Turicchia, B. Schmidt-Nielsen, and R. Sarpeshkar, “An FFTbased
companding front end for noise-robust automatic speech recognition,â€
EURASIP J. Audio, Speech, Music Process., vol. 2007, 2007,
10.1155/2007/65420, Article ID 65420.
R. Sarpeshkar, M. W. Baker, C. D. Salthouse, J. Sit, L. Turicchia, and
S. M. Zhak, “An ultra-low-power programmable analog bionic ear processor,â€
IEEE Trans. Biomed. Eng., vol. 52, no. 4, pp. 711–727, Apr.
2005.
L. Turicchia and R. Sarpeshkar, “A bio-inspired companding strategy
for spectral enhancement,†IEEE Trans. Speech Audio Process., vol.
13, no. 2, pp. 243–253, Mar. 2005.