06-10-2012, 05:12 PM
Speech Recognition REPORT
Speech Recognition.docx (Size: 165.65 KB / Downloads: 32)
A speech recognizer is a speech engine that converts speech to text. The javax.speech.recognition package defines the Recognizer interface to support speech recognition plus a set of supporting classes and interfaces. The basic functional capabilities of speech recognizers, some of the uses of speech recognition and some of the limitations of speech recognizers are described in Section 2.2.
As a type of speech engine, much of the functionality of a Recognizer is inherited from the Engine interface in the javax.speech package and from other classes and interfaces in that package. The javax.speech package and generic speech engine functionality are described in Chapter 4.
The Java Speech API is designed to keep simple speech applications simple Þ and to make advanced speech applications possible for non-specialist developers. This chapter covers both the simple and advanced capabilities of the javax.speech.recognition package. Where appropriate, some of the more advanced sections are marked so that you can choose to skip them.
Recognizer as an Engine
The basic functionality provided by a Recognizer includes grammar management and the production of results when a user says things that match active grammars. The Recognizer interface extends the Engine interface to provide this functionality.
The following is a list of the functionality that the javax.speech.recognition package inherits from the javax.speech package and outlines some of the ways in which that functionality is specialized.
• The properties of a speech engine defined by the EngineModeDesc class apply to recognizers. The RecognizerModeDesc class adds information about dictation capabilities of a recognizer and about users who have trained the engine. Both EngineModeDesc and RecognizerModeDesc are described in Section 4.2.
• Recognizers are searched, selected and created through the Central class in the javax.speech package as described in Section 4.3. That section explains default creation of a recognizer, recognizer selection according to defined properties, and advanced selection and creation mechanisms.
• Recognizers inherit the basic state systems of an engine from the Engine interface, including the four allocation states, the pause and resume state, the state monitoring methods and the state update events. The engine state systems are described in Section 4.4. The two state systems added by recognizers are described in Section 6.3.
• Recognizers produce all the standard engine events (see Section 4.5). The javax.speech.recognition package also extends the EngineListener interface as RecognizerListener to provide events that are specific to recognizers.
• Other engine functionality inherited as an engine includes the runtime properties (see Section 4.6.1 and Section 6.8), audio management (see Section 4.6.2) and vocabulary management (see Section 4.6.3).
Recognition States
The most important (and most complex) state system of a recognizer represents the current recognition activity of the recognizer. An ALLOCATED Recognizer is always in one of the following three states:
• LISTENING state: The Recognizer is listening to incoming audio for speech that may match an active grammar but has not detected speech yet. A recognizer remains in this state while listening to silence and when audio input runs out because the engine is paused.
• PROCESSING state: The Recognizer is processing incoming speech that may match an active grammar. While in this state, the recognizer is producing a result.
• SUSPENDED state: The Recognizer is temporarily suspended while grammars are updated. While suspended, audio input is buffered for processing once the recognizer returns to the LISTENING and PROCESSING states.
This sub-state system is shown in Figure 6-1. The typical state cycle of a recognizer is triggered by user speech. The recognizer starts in the LISTENING state, moves to the PROCESSING state while a user speaks, moves to the SUSPENDED state once recognition of that speech is completed and while grammars are updates in response to user input, and finally returns to the LISTENING state.