Speech Recognition ppt

**project girl** · 13-11-2012, 06:10 PM

Speech Recognition

.ppt

Speech Recognition1.ppt (Size: 1.19 MB / Downloads: 37)

Speech Recognition are technologies of particular interest, for their support of direct communication between humans and computers, through a communications mode, humans commonly use among themselves and at which they are highly skilled.

Types of speech recognition

Isolated words
Connected words
Continuous speech
Spontaneous speech (automatic speech recognition)
Voice verification and identification

Challenges of speech recognition

Ease of use
Robust performance
Automatic learning of new words and sounds
Grammar for spoken language
Control of synthesized voice quality
Integrated learning for speech recognition and synthesis

What is the task?

Getting a computer to understand spoken language
By “understand” we might mean
React appropriately
Convert the input speech into another medium, e.g. text
Several variables impinge on this (see later)

What’s hard about that?

Digitization
Converting analogue signal into digital representation
Signal processing
Separating speech from background noise
Phonetics
Variability in human speech
Phonology
Recognizing individual sound distinctions (similar phonemes)
Lexicology and syntax
Disambiguating homophones
Features of continuous speech
Syntax and pragmatics
Interpreting prosodic features
Pragmatics
Filtering of performance errors (disfluencies)

Digitization

Analogue to digital conversion
Sampling and quantizing
Use filters to measure energy levels for various points on the frequency spectrum
Knowing the relative importance of different frequency bands (for speech) makes this process more efficient
E.g. high frequency sounds are less informative, so can be sampled using a broader bandwidth (log scale)

Identifying phonemes

Differences between some phonemes are sometimes very small
May be reflected in speech signal (eg vowels have more or less distinctive f1 and f2)
Often show up in coarticulation effects (transition to next sound)
e.g. aspiration of voiceless stops in English
Allophonic variation

Performance errors

Performance “errors” include
Non-speech sounds
Hesitations
False starts, repetitions
Filtering implies handling at syntactic level or above
Some disfluencies are deliberate and have pragmatic effect – this is not something we can handle in the near future

Template-based approach

Hard to distinguish very similar templates
And quickly degrades when input differs from templates
Therefore needs techniques to mitigate this degradation:
More subtle matching techniques
Multiple templates which are aggregated
Taken together, these suggested …

Statistics-based approach

Collect a large corpus of transcribed speech recordings
Train the computer to learn the correspondences (“machine learning”)
At run time, apply statistical processes to search through the space of all possible solutions, and pick the statistically most likely one

Machine learning

Acoustic and Lexical Models
Analyse training data in terms of relevant features
Learn from large amount of data different possibilities
different phone sequences for a given word
different combinations of elements of the speech signal for a given phone/phoneme
Combine these into a Hidden Markov Model expressing the probabilities

The Noisy Channel Model

Use the acoustic model to give a set of likely phone sequences
Use the lexical and language models to judge which of these are likely to result in probable word sequences
The trick is having sophisticated algorithms to juggle the statistics
A bit like the rule-based approach except that it is all learned automatically from data

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Software Life-Cycle Models ppt	seminar flower	1	3,852	22-09-2017, 10:54 AM Last Post: jaseela123
	PPT ON LINUX	project girl	1	1,829	21-09-2017, 03:56 PM Last Post: jaseela123
	Public Key Infrastructure (Digital Certificates and Digital Signatures) PPT	project girl	1	2,364	21-09-2017, 01:18 PM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Biometric Authentication PPT	project girl	1	1,109	19-09-2017, 02:32 PM Last Post: jaseela123
	Android Interface Definition Language PPT	project girl	1	1,681	19-09-2017, 10:58 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.