Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: A MULTILINGUAL SCREEN READER IN INDIAN LANGUAGES pdf
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
A MULTILINGUAL SCREEN READER IN INDIAN LANGUAGES

[attachment=39277]

ABSTRACT

Screen reader is a form of assistive technology to help visually
impaired people to use or access the computer and Internet.
So far, it has remained expensive and within the domain
of English (and some foreign) language computing. For Indian
languages this development is limited by: availability of
Text-to-Speech (TTS) system in Indian languages, support for
reading glyph based font encoded text, Text Normalization for
converting non standard words into standard words, supporting
multiple languages. In this paper we would discuss how
to handle these issues in building multilingual screen reader
in Indian languages.

INTRODUCTION

A screen reading software speaks out the contents on the current
screen (display) of a computer system. Such software
would help a visually challenged user to access the computer
and Internet. Typically, screen readers have remained expensive
and within the domain of English.
Some of the limitations observed in existing screen readers
[1] [2] [3] include: 1) Professional English screen readers
fail to provide good support for Indian languages. The
voices have US/UK accent and thus native Indian speakers
found it hard to comprehend. 2) Screen readers in Indian languages
support one or two major speaking languages. At the
same time, they often support only Unicode formats and thus
ignore the best local websites such as Eenadu, Vaartha, and
Jagran, which use ASCII based fonts. 3) Some screen readers
do not make use of recent advances in text-to-speech technologies
and thus use robotic synthetic voice.
Our mission is to develop a multi-lingual screen reader
(for visually impaired) system which can read contents in all
ofcial languages of India such as Hindi, Telugu, Tamil etc.,
including Indian English, and provide support for different
computer applications (email, Internet, ofce software) using
intelligible, human-sounding synthetic speech.

SYSTEM ARCHITECTURE

The system architecture in Figure 1 is modular based and it
consists of three sub-systems. They are (i) Text Extractor subsystem,
(ii) Text Preprocessing sub-system and (iii) Text-to-
Speech systems sub-system. The sub-systems are further divided
into many modules as given below.

Text Processing

Text preprocessing sub-system is functionally classied into:
(i) Text encoding or font-type identier for determining extracted
text is encoded in Unicode or ISCII or ASCII type
font, (ii) Text converter for converting these different encoded
text into a phonetic transliteration notation IT3, (iii) Text
normalizer for converting the Non-Standard Word (NSW)
(like currency, time, title etc.,) into a standard pronounceable/
readable words, and (iv) Language identier for identifying
the language constituent for selecting appropriate TTS
system.

Text-to-Speech System

Text-to-speech sub-system is functionally classied into: (i)
TTS selector for switching between different language TTS
systems based on language, (ii) Voice loader for loading
dynamically available voice for current language (iii) Voice
inventory for maintaining prerecorded speech segments and
language related features, (iv) Unit selection algorithm for
selecting the optimal speech segment for concatenation, and
(v) Speech segment concatenation. The TTS sub system uses
techniques explained in [4], [5], [6]. The voices are built on
Festvox framework [7].

COMPONENTS OF SCREEN READER

Developing a full-edged screen reader for Indian languages
is not only a challenging but a daunting task considering the
eventualities aspects involved in the design and implementation.
As we have addressed the common issues in developing
a screen reader in the Section 1, we now look into some intrinsic
application such as internal text storage, identifying the
font type name, identifying the language, converting, normalizing
the text as TTS renderable form (IT3 data) and invoking
the TTS system which is capable of synthesizing the voice for
the identied language text on the screen.

Internal Storage Format of Text

Font Processing

Most of the Indian language electronic content is available
in different font encoding formats. Indian language content
processing is a real world problem for a researcher. Here
processing includes identifying the font encoding and converting
the text into TTS renderable input format. To process
the font-data rst and foremost job of a screen reader
is to identify the underlying encoding or font. There will be
no header information like in the case of Unicode (UTF-8)
encoding or a distinguished ASCII code ranges for English
and Indian languages like in ISCII. All we can get is the sequence
of glyph code values (ASCII values). So these can
be identied through statistical modeling or machine learning
techniques. The statistical models are generated by following
some statistical methods using the glyph codes. Converters or
models are developed using some machine learning algorithm
to learn the glyph code order. And these converters or models
are capable of identifying the font using the statistical models.
The font identication module uses vector space models
for font-type identication [8], [9].

Text Preprocessing

In screen reader perspective text processing means converting
the extracted text in some encoded/compatible format
for the TTS system, i.e font-to-IT3 conversion, Unicode text
conversion, ISCII text conversion etc. Then we need to normalize
the converted text. Text normalization is the process of
converting non-standard words like numbers, abbreviations,
titles, currency etc., to expanded/natural (pronounceable)
words. This can be achieved by writing a text Normalizer for
each and every language or designing a generic framework
which ts for most of the languages. There is one more step
which is optional. If we are using some third party TTS systems
then we need to convert text in the notation supported
by that TTS system, since the input notation for TTS is vendor/
developer specic. If we are using our own TTS system
then there is no such overhead.

TEXT-TO-SPEECH SYSTEM

A text-to-speech system converts the given text into corresponding
spoken form. The current state of art systems are
concatenative synthesis [10] and statistical parametric synthesis
[11, 12]. The size of unit selection speech synthesis is
between few hundred of MBs to GBs. Such a huge database
requires a large memory size and slows down the computational
speed. It also causes too much hindrance to download
and install in ordinary machine. Existing systems uses pruned
databases for concatenative synthesis. Hence, the quality of
the synthesis is not natural. Currently we are planning to integrate
statistical parametric synthesis. The size of such synthesis
is very small and easily downloadable. We have built statistical
parametric synthesis using Articial Neural Networks
(ANN). Following sub-sections explains in detail.

CONCLUSION

In this paper we have discussed issues involved in building
multilingual screen reader; such as extraction of text from applications,
converting extracted text into system readable and
understandable form and building small footprint synthesizer
for converting text into spoken form. The quality of the synthesis
is evaluated using objective measures.