Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Voice xml
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
PRESENTED BY
SANTHOSH

[attachment=11191]
Voice xml
Introduction

 VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer.
 It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications.
 Just as HTML documents are interpreted by a visual web browser, VoiceXML documents are interpreted by a voice browser.
 A common architecture is to deploy banks of voice browsers attached to the Public Switched Telephone Network (PSTN) to allow users to interact with voice applications over the telephone.
VoiceXml HTML
History

 AT&T, IBM, Lucent, and Motorola formed the VoiceXML Forum in March 1999, in order to develop a standard markup language for specifying voice dialogs.
 In March 2000 they published VoiceXML 1.0
 The W3C produced several intermediate versions of VoiceXML 2.0, which reached the final "Recommendation" stage in March 2004.
Usage:
 Many commercial VoiceXML applications have been deployed, processing millions of telephone calls per day.
 These applications include: order inquiry, package tracking, driving directions, emergency notification, flight tracking, voice access to email, audio news magazines, voice dialing etc.
 VoiceXML has tags that instruct the voice browser to provide speech synthesis, automatic speech recognition, dialog management, and audio playback.
The following is an example of a VoiceXML document:
<vxml version="2.0" xmlns="http://www.w32001/vxml">
 <form>
 <block>
 <prompt> Hello world! </prompt>
 </block>
 </form>
 </vxml>
When interpreted by a VoiceXML interpreter this will output "Hello world" with synthesized speech.
Typically, HTTP is used as the transport protocol for fetching VoiceXML pages. Some applications may use static VoiceXML pages, while others rely on dynamic VoiceXML page generation using an application server like Tomcat, Weblogic, IIS, or WebSphere.
Voice XML Inputs
Voice XML accepts:

 Touch tone key
 Speech
 There is a difference between voice recognition and Voice XML.
 Personal voice recongition systems allow for wide grammar, but restrict the number of users (ie. Dragon Naturally Speaking or IBM Via Voice.)
 Voice XML restricts the grammar, but allows for a wide number of users.
Advantages of Voice XML
 Allows for the easy implementation of voice interfaces
 Removes the restrictions imposed by tools designed for touchtone systems
 Runs on existing web infrastructure
Future versions of the standard
 VoiceXML 3.0 will be the next major release of VoiceXML, with new major features. It includes a new XML statechart description language called SCXML.
VoiceXML:

[attachment=35475]

Overview:

The origins of VoiceXML began in 1995 as an XML-based dialog design language intended to simplify the speech recognition application development process within an AT&T project called Phone Markup Language (PML). As AT&T reorganized, teams at AT&T, Lucent and Motorola continued working on their own PML-like languages.
In 1998, W3C hosted a conference on voice browsers. By this time, AT&T and Lucent had different variants of their original PML, while Motorola had developed VoxML, and IBM was developing its own SpeechML. Many other attendees at the conference were also developing similar languages for dialog design; for example, such as HP's TalkML and PipeBeach's VoiceHTML.
The VoiceXML Forum was then formed by AT&T, IBM, Lucent, and Motorola to pool their efforts. The mission of the VoiceXML Forum was to define a standard dialog design language that developers could use to build conversational applications. They chose XML as the basis for this effort because it was clear to them that this was the direction technology was going.
In 2000, the VoiceXML Forum released VoiceXML 1.0 to the public. Shortly thereafter, VoiceXML 1.0 was submitted to the W3C as the basis for the creation of a new international standard. VoiceXML 2.0 is the result of this work based on input from W3C Member companies, other W3C Working Groups, and the public.

Goals of VoiceXML:

VoiceXML's main goal is to bring the full power of Web development and content delivery to voice response applications, and to free the authors of such applications from low-level programming and resource management. It enables integration of voice services with data services using the familiar client-server paradigm. A voice service is viewed as a sequence of interaction dialogs between a user and an implementation platform. The dialogs are provided by document servers, which may be external to the implementation platform. Document servers maintain overall service logic, perform database and legacy system operations, and produce dialogs. A VoiceXML document specifies each interaction dialog to be conducted by a VoiceXML interpreter. User input affects dialog interpretation and is collected into requests submitted to a document server. The document server replies with another VoiceXML document to continue the user's session with other dialogs.

Scope of VoiceXML:

The language describes the human-machine interaction provided by voice response systems, which includes:
• Output of synthesized speech (text-to-speech).
• Output of audio files.
• Recognition of spoken input.
• Recognition of DTMF input.
• Recording of spoken input.
• Control of dialog flow.
• Telephony features such as call transfer and disconnect.
The language provides means for collecting character and/or spoken input, assigning the input results to document-defined request variables, and making decisions that affect the interpretation of documents written in the language. A document may be linked to other documents through Universal Resource Identifiers (URIs).

Principles of Design:

VoiceXML is an XML application

1. The language promotes portability of services through abstraction of platform resources.
2. The language accommodates platform diversity in supported audio file formats, speech grammar formats, and URI schemes. While producers of platforms may support various grammar formats the language requires a common grammar format, namely the XML Form of the W3C Speech Recognition Grammar Specification , to facilitate interoperability. Similarly, while various audio formats for playback and recording may be supported, the audio formats described in Appen must be supported
3. The language supports ease of authoring for common types of interactions.
4. The language has well-defined semantics that preserves the author's intent regarding the behavior of interactions with the user. Client heuristics are not required to determine document element interpretation.
5. The language recognizes semantic interpretations from grammars and makes this information available to the application.
6. The language has a control flow mechanism.
7. The language enables a separation of service logic from interaction behavior.
8. It is not intended for intensive computation, database operations, or legacy system operations. These are assumed to be handled by resources outside the document interpreter, e.g. a document server.
9. General service logic, state management, dialog generation, and dialog sequencing are assumed to reside outside the document interpreter.
10. The language provides ways to link documents using URIs, and also to submit data to server scripts using URIs.
11. VoiceXML provides ways to identify exactly which data to submit to the server, and which HTTP method (GET or POST) to use in the submittal.
12. The language does not require document authors to explicitly allocate and deallocate dialog resources, or deal with concurrency. Resource allocation and concurrent threads of control are to be handled by the implementation platform.

Concepts

A VoiceXML document (or a set of related documents called an application) forms a conversational finite state machine. The user is always in one conversational state, or dialog, at a time. Each dialog determines the next dialog to transition to. Transitions are specified using URIs, which define the next document and dialog to use. If a URI does not refer to a document, the current document is assumed. If it does not refer to a dialog, the first dialog in the document is assumed. Execution is terminated when a dialog does not specify a successor, or if it has an element that explicitly exits the conversation.

Dialogs and Subdialogs:

There are two kinds of dialogs: forms and menus. Forms define an interaction that collects values for a set of form item variables. Each field may specify a grammar that defines the allowable inputs for that field. If a form-level grammar is present, it can be used to fill several fields from one utterance. A menu presents the user with a choice of options and then transitions to another dialog based on that choice.
A subdialog is like a function call, in that it provides a mechanism for invoking a new interaction, and returning to the original form. Variable instances, grammars, and state information are saved and are available upon returning to the calling document. Subdialogs can be used, for example, to create a confirmation sequence that may require a database query; to create a set of components that may be shared among documents in a single application; or to create a reusable library of dialogs shared among many applications.

Applications:

An application is a set of documents sharing the same application root document. Whenever the user interacts with a document in an application, its application root document is also loaded. The application root document remains loaded while the user is transitioning between other documents in the same application, and it is unloaded when the user transitions to a document that is not in the application. While it is loaded, the application root document's variables are available to the other documents as application variables, and its grammars remain active for the duration of the application, subject to the grammar activation rules discussed in
Figure 2 shows the transition of documents (D) in an application that share a common application root document (root).

Grammars:

Each dialog has one or more speech and/or DTMF grammars associated with it. In machine directed applications, each dialog's grammars are active only when the user is in that dialog. In mixed initiative applications, where the user and the machine alternate in determining what to do next, some of the dialogs are flagged to make their grammars active (i.e., listened for) even when the user is in another dialog in the same document, or on another loaded document in the same application. In this situation, if the user says something matching another dialog's active grammars, execution transitions to that other dialog, with the user's utterance treated as if it were said in that dialog. Mixed initiative adds flexibility and power to voice applications.

Events:

Events are thrown by the platform under a variety of circumstances, such as when the user does not respond, doesn't respond intelligibly, requests help, etc. The interpreter also throws events if it finds a semantic error in a VoiceXML document. Events are caught by catch elements or their syntactic shorthand. Each element in which an event can occur may specify catch elements. Furthermore, catch elements are also inherited from enclosing elements "as if by copy". In this way, common event handling behavior can be specified at any level, and it applies to all lower levels.

Mixed Initiative Forms:

The last section talked about forms implementing rigid, computer-directed conversations. To make a form mixed initiative, where both the computer and the human direct the conversation, it must have one or more form-level grammars. The dialog may be written in several ways. One common authoring tyle combines an <initial> element that prompts for a general response with <field> elements that prompt for specific information. This is illustrated in the example below. More complex techniques, such as using the 'cond' attribute on <field> elements, may achieve a similar effect.