31-07-2012, 03:55 PM
Automatic Speech Recognition
Automatic Speech Recognition.ppt (Size: 829.5 KB / Downloads: 127)
What is the task?
Getting a computer to understand spoken language
By “understand” we might mean
React appropriately
Convert the input speech into another medium, e.g. text
Several variables impinge on this (see later)
How do humans do it?
Articulation produces
sound waves which
the ear conveys to the brain
for processing
What’s hard about that?
Digitization
Converting analogue signal into digital representation
Signal processing
Separating speech from background noise
Phonetics
Variability in human speech
Phonology
Recognizing individual sound distinctions (similar phonemes)
Lexicology and syntax
Disambiguating homophones
Features of continuous speech
Syntax and pragmatics
Interpreting prosodic features
Pragmatics
Filtering of performance errors (disfluencies)
Digitization
Analogue to digital conversion
Sampling and quantizing
Use filters to measure energy levels for various points on the frequency spectrum
Knowing the relative importance of different frequency bands (for speech) makes this process more efficient
E.g. high frequency sounds are less informative, so can be sampled using a broader bandwidth (log scale)
Variability in individuals’ speech
Variation among speakers due to
Vocal range (f0, and pitch range – see later)
Voice quality (growl, whisper, physiological elements such as nasality, adenoidality, etc)
ACCENT !!! (especially vowel systems, but also consonants, allophones, etc.)
Variation within speakers due to
Health, emotional state
Ambient conditions
Speech style: formal read vs spontaneous