Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Speech Recognition PPT
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Speech Recognition

[attachment=43821]


Definition

Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words.
The recognised words can be an end in themselves, as for applications such as commands & control, data entry, and document preparation.
They can also serve as the input to further linguistic processing in order to achieve speech understanding

Speech Processing

Signal processing:
Convert the audio wave into a sequence of feature vectors
Speech recognition:
Decode the sequence of feature vectors into a sequence of words
Semantic interpretation:
Determine the meaning of the recognized words
Dialog Management:
Correct errors and help get the task done
Response Generation
What words to use to maximize user understanding
Speech synthesis (Text to Speech):
Generate synthetic speech from a ‘marked-up’ word string

Dialog Management

Goal: determine what to accomplish in response to user utterances, e.g.:
Answer user question
Solicit further information
Confirm/Clarify user utterance
Notify invalid query
Notify invalid query and suggest alternative
Interface between user/language processing components and system knowledge base

What you can do with Speech Recognition

Transcription
dictation, information retrieval
Command and control
data entry, device control, navigation, call routing
Information access
airline schedules, stock quotes, directory assistance
Problem solving
travel planning, logistics

Transcription and Dictation

Transcription is transforming a stream of human speech into computer-readable form
Medical reports, court proceedings, notes
Indexing (e.g., broadcasts)
Dictation is the interactive composition of text
Report, correspondence, etc.

Speech recognition and understanding

Sphinx system
speaker-independent
continuous speech
large vocabulary
ATIS system
air travel information retrieval
context management

Speech Recognition and Call Centres

Automate services, lower payroll
Shorten time on hold
Shorten agent and client call time
Reduce fraud
Improve customer service

Many kinds of Speech Recognition Systems

Speech recognition systems can be characterised by many parameters.
An isolated-word (Discrete) speech recognition system requires that the speaker pauses briefly between words, whereas a continuous speech recognition system does not.

Spontaneous V Scripted

Spontaneous, speech contains disfluencies, periods of pause and restart, and is much more difficult to recognise than speech read from script.

Large V small vocabularies

Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large with many similar-sounding words.
When speech is produced in a sequence of words, language models or artificial grammars are used to restrict the combination of words.
The simplest language model can be specified as a finite-state network, where the permissible words following each word are given explicitly.

Perplexity

One popular measure of the difficulty of the task, combining the vocabulary size and the language model, is perplexity.
Loosely defined as the geometric mean of the number of words that can follow a word after the language model has been applied., (Zue, Cole, and Ward, 1995).
Finally, some external parameters can affect speech recognition system performance. These include the characteristics of the environmental noise and the type and the placement of the microphone.

 A TIMELINE OF SPEECH RECOGNITION

1890s Alexander Graham Bell discovers Phone while trying to develop speech recognition system for deaf people.
1936AT&T's Bell Labs produced the first electronic speech synthesizer called the Voder (Dudley, Riesz and Watkins).
This machine was demonstrated in the 1939 World Fairs by experts that used a keyboard and foot pedals to play the machine and emit speech.
1969John Pierce of Bell Labs said automatic speech recognition will not be a reality for several decades because it requires artificial intelligence.

Early 70s

Early 1970'sThe Hidden Markov Modeling (HMM) approach to speech recognition was invented by Lenny Baum of Princeton University and shared with several ARPA (Advanced Research Projects Agency) contractors including IBM.
HMM is a complex mathematical pattern-matching strategy that eventually was adopted by all the leading speech recognition companies including Dragon Systems, IBM, Philips, AT&T and others.

Signal Variability

Speech recognition is a difficult problem, largely because of the many sources of variability associated with the signal.
The acoustic realisations of phonemes, the recognition systems smallest sound units of which words are composed, are highly dependent on the context in which they appear.
These phonetic variables are exemplified by the acoustic differences of the phoneme 't/'in two, true, and butter in English.
At word boundaries, contextual variations can be quite dramatic, and devo andare sound like devandare in Italian.

More

Acoustic variability can result from changes in the environment as well as in the position and characteristics of the transducer.
Within-speaker variability can result from changes in the speaker's physical and emotional state, speaking rate, or voice quality.
Differences in socio-linguistic background, dialect, and vocal tract size and shape can contribute to across-speaker variability.

What is a speech recognition system?

Speech recognition is generally used as a human computer interface for other software. When it functions in this role, three primary tasks need be performed.
Pre-processing, the conversion of spoken input into a form the recogniser can process.
Recognition, the identification of what has been said.
Communication, to send the recognised input to the application that requested it.

Auditory perception, hearing speech.

"Phonemes tend to be abstractions that are implicitly defined by the pronunciation of the words in the language. In particular, the acoustic realisation of a phoneme may heavily depend on the acoustic context in which it occurs. This effect is usually called co-articulation", (Ney, 1994).
The way a phoneme is pronounced can be affected by its position in a word, neighbouring phonemes and even the word's position in a sentence. This affect is called the co-articulation effect.
The variability in the speech signal caused by co-articulation and other sources make speech analysis very difficult.