15-02-2013, 10:02 AM
Voice Browsers
Voice Browsers.ppt (Size: 2.23 MB / Downloads: 38)
What is a Voice Browser?
Expanding access to the Web
Will allow any telephone to be used to access appropriately designed Web-based services
Server-based
Voice portals
Interaction via key pads, spoken commands, listening to prerecorded speech, synthetic speech and music.
An advantage to people with visual impairment
Web access while keeping hands & eyes free for other things (eg. Driving).
Motivation
Far more people today have access to a telephone than have access to a computer with an Internet connection.
Many of us have already or soon will have a mobile phone within reach wherever we go.
Easy to use - for people with no knowledge or fear of computers.
Voice interaction can escape the physical limitations on keypads and displays as mobile devices become ever smaller.
Possible Applications
Accessing business information:
The corporate "front desk" which asks callers who or what they want
Automated telephone ordering services
Support desks
Order tracking
Airline arrival and departure information
Cinema and theater booking services
Home banking services
Advancing Towards Voice
Until now, speech recognition and synthesis technologies had to be handcrafted into applications.
Voice Browsers intend the voice technologies to be handcrfted directly into web servers.
This demands transformation of Web content into formats better suited to the needs of voice browsing or authoring content directly for voice browsers.
Speech Synthesis
The specification defines a markup language for prompting users via a combination of prerecorded speech, synthetic speech and music. You can select voice characteristics (name, gender and age) and the speed, volume, pitch, and emphasis. There is also provision for overriding the synthesis engine's default pronunciation.
DTMF Grammars
Touch tone input is often used as an alternative to speech recognition.
Especially useful in noisy conditions or when the social context makes it awkward to speak.
The W3C DTMF grammar format allows authors to specify the expected sequence of digits, and to bind them to the appropriate results
Speech Grammars
In most cases, user prompts are very carefully designed to encourage the user to answer in a form that matches context free grammar rules.
Speech Grammars allow authors to specify rules covering the sequences of words that users are expected to say in particular contexts. These contexual clues allow the recognition engine to focus on likely utterances, improving the chances of a correct match.
Semantic Interpretation
The recognition process matches an utterance to a speech grammar, building a parse tree as a byproduct.
There are two approaches to harvesting semantic results from the parse tree:
1. Annotating grammar rules with semantic interpretation tags (ECMAScript).
2. Representing the result in XML.
Call Control
Fine-grained control of speech (signal processing) resources and telephony resources in a VoiceXML telephony platform.
Will enable application developers to use markup to perform call screening, whisper call waiting, call transfer, and more.
Can be used to transfer a user from one voice browser to another on a competely different machine.
Voice Browser Interoperation (2)
Finally, the user could transfer from a VoiceXML application to a customer service agent.
The agent needs the ability to use their console to view information about the customer, as collected during the preceding VoiceXML application. The ability to transfer a session identifier can be used to retrieve this information from the customer database.