27-06-2012, 03:35 PM
Eye-like Algorithm to Produce Voice Web Pages
Voice Web Pages.pdf (Size: 213.34 KB / Downloads: 22)
Abstract
Nowadays, internet web sites and applications are
efficient tools with which users interact by downloading and
uploading data through a browser. Same functionality is not
available through the phone for all internet web sites and
applications. In this article, we propose a solution through
which callers can call a global provider and specify the web
site they wish to surf. A voice application will lead them
through the web site allowing them to receive and enter
data. We would like to avoid techniques that require radical
changes in existing web sites and applications.
INTRODUCTION
Internet users can simply write the URL (Uniform
Resource Locator) of the web site they wish to visit in the
address bar of their browsers in order to start interacting
with the site. Then, they can have a look at the returned
page, read some titles or menus, decide what their next
destination is, and click some menu item or a hyperlink.
They keep doing so until they reach the page that includes
the content which they are interested in. Phone users
cannot do the same with all web sites / web applications
available on the internet. This article proposes a solution
through which callers will be able to interact with all web
sites / web applications available on the internet.
RELATED WORK
The IBM WTP (WebSphere Transcoding Publisher)
provides transcoding from HTML (Hyper Text Markup
Language) to VXML (Voice eXtensive Markup
Language) [7]. This product works well for simple HTML
documents. For example, if a page has no heading tags,
the result would be very low in terms of usability.
Annotations were suggested as a method of specifying
important sections of pages [1] [4] [11]. We do not
believe that such a suggestion may work since it requires
augmenting the original HTML file with annotations. We
believe that this approach is not optimal. It is a significant
burden to keep the HTML file synchronized with its
associated annotation file. Many other suggestions and
products attempted to offer a solution (Aurora transcoding
system [5], SALT [12], XHTML+Voice [14], Sisl [2],
UIML-to-voice transcoders [10]). Those solutions require
introducing changes to every single page on the internet.
As a result, we don't think those suggested approaches
will be globally considered as efficient solutions.
PROPOSED SOLUTION
Several transcoders convert HTML pages to
corresponding VXML ones [7]. The main obstacle is not
the straightforward translation from a language to another.
Instead, it is in the significance of the output of such
translation [11]. To achieve a high level of this
significance, many researchers decided to use annotations
so that HTML pages can be dynamically analyzed.
Adding annotations to the HTML page or creating a new
file to inform about the different parts of the page or what
parts might be of interest to the callers enhances the way
the HTML page can be processed by the voice
application. However, this means that each HTML page
needs to be changed to meet those new specifications. If
each web site has a corresponding voice application,
companies will need to spend more money in order to
develop their voice-based web sites and keep both sites
synchronized. This article aims at proposing a solution
that does not require radical changes.
ANALYZING DYNAMIC PAGES
We believe that it is not fruitful to focus on the
HTML file alone. Previous studies tried to explore the
annotated HTML pages [1] [11]. They aimed at finding
content that might be of interest to the phone callers.
Analyzing HTML pages is not an easy task due to the fact
that they can be messy and disorganized. When the
structure of the HTML page is known, there is a good
chance of presenting the page to the caller in an efficient
way. The main drawback of the annotation approach is
that it requires changing the web pages or adding new
ones. Our proposal is to analyze the source page and the
resulting HTML page in order to discover the structure.