10-09-2016, 09:21 AM
Bridging the Gap between Disabled People and New Technology in Interactive Web Application with the Help of Voice
1454254976-projectreview.docx (Size: 99.95 KB / Downloads: 4)
Abstract :
Voice is the best medium to communicate. Major area of developments in the field of voicerecognition is more natural way of interacting with web technology. This web application focuses on the speech reorganization and get the result in voice. The main objective of this web application to provide easy interaction between the user and the computer. The web application uses voice to interact between them. The web application uses the JSAPI components of speech recognition, the voice and the text components along with the latest advancements made in the field of web uses. The speech recognizer that we uses in our proposed work for making better interaction of web application with the help of voice is first open source. This recognizer is capable of real time, medium vocabulary speech recognition.
Introduction:
In modern era where voice and visual technologies are used in widely extended. Voder provide the first speech synthesizer. This application is easily operated on the web server. The sound level is quieted well. In past days machine talk is the way through which human interact but now a day’s machine and human interaction is the main motto, which give the invention of speech recognition. This speech recognition concludes many areas like mathematics, artificial intelligence, machine learning, statics and other electronics devices (microphone, processor crd, sound card technology). This type of developed application is capable to understand specific context of sentences, words, commands and makes the user flexible to input as a voice and also help to control all the web application text available on the web server as well as web reading. Many implementations is already done on English speech recognition, but creates problem with the person with disability, in this paper main focus on the speech reorganization and get the result in voice too.
To easily communicate with the people voice is the best medium, not only peoples but also to the web it is more flexible way, from early days for man machine interaction voice is the best medium to interact for communicating the computer with more humanly nature like voice to text and text to voice conversion have been studied. To make this communication should more flexible.This is done by transmitting voice not only with ideas, concepts but also with information carrying attitude, emotion and individuality of speaker.
Objectives:
• Individuals use, follow the web development technology for better communication and cultural development.
• Due to the increasing use of audio visual communication tools invites people to use new modernized tools in education.
• For the success of students is to adapt the modernized educational equipment by voice.
• This is a modernized period where the teaching based upon web technology and media is better than verbal teaching.
• It increases the level of learning, teaching and also provides solid information.
• It improves students speaking and listening comprehension skills by the help of this application.
Literature survey:
Discriminative learning.[2]Speech awarding software is available in large quantities in the market. Many renewed software are developed by many prestigious company like IBM, Philips . One of the interactive speech reorganization software developed by Microsoft is genie. Also various voice navigation applications and speech recognizer are developed like sphinx , sonic, voice. In speech recognition process surroundings voices are the worst part, it confuses the recognizer to understand the real voice that are supposed to hear . This type of voice recognizer is used in robots. One such recognizer has been states for medical robots that, despite of the inevitable motor noises, makes it communicate with the people efficiently. It is made possible by using a noise-type-dependent acoustic model corresponding to a performing motion of voice in the web speech.
Hybrid SVM/HMM.[4]Speech recognition can be viewed as a pattern recognition problem where we desire eachunique sound to be distinguishable from all other sounds. Traditionally statistical models,such as Gaussian mixture models, have been used to “represent” the various modalities fora given speech sound. The parameters of the Gaussians are estimated using a MaximumLikelihood criterion. The ML formulation for the representation of the acoustic space does not necessarily translate to better recognition performance since most of the optimization effort is spent in learning the intricacies of the training distributions. Extensions of the learning paradigm involving discriminative training techniques such as Maximum Mutual Information and Minimum Classification Error attempt to estimate parameters using both positive and negative examples . Though they give consistent improvements in recognition performance, these techniques are computationally very expensive and are, thus, limited to small vocabulary tasks.
Telecommunications Applications.[6]The optimizations of speech recognition such as HP Smart Badge IV embedded system has been proposed to reduce the energy consumption. While there maintaining the quality of the web application. Another such scalable system has been proposed in for DSR (Distributed Speech recognition) by combining it with scalable compression and hence reducing the bandwidth requirement on the web server.
Many capabilities of current speech recognizers in the field of human computer interaction are described in like Voice Banking and Directory Assistance. Speech recognition allows us to provide input the voice to an application with your voice. Just like clicking with your mouse, typing on your keyboard, or pressing a key on the phone keypad provides input to an application, speech recognition allows you to provide input by talking. In the world of internet, we need a microphone to be able to do this. In the Voice control world, all you need is a internet.
Through web speech API speech synthesis and speech recognition adds up in the process. The post temporarily covers the last e.g. API recently added in Google chrome is x-webkit. For supporting command and control reorganization dictation systems and speech synthesis the java speech API is used as an application programming interface.
Two core technology used in this proposed model are:
A. Speech Synthesis
Speech synthesis is used to produce synthetic speech from the text produces from different applications, anapplet and the users. It is basically the technology if text to speech, there is following steps for producing text from the voice.
1) Structure analysis: In this we analyze that where the sentences and paragraphs starts and ends. it is also preferable for different punctuations and formatting of data.
2) Text pre-processing:This processing is special constructs of languages like English for their abbreviations, date, time, numbers, accounts and email address.
The remaining steps convert the spoken text to speech:
3) Text-to-phoneme conversion:In it we convert each words in the basic unit of sound in a language.
4) Prosody analysis:For determining the appropriate structure, words and prosody of the sentences.
5) Waveform production: For producing waveforms for each sentences by using prosody and phoneme information.
In the above steps there is the possibility of errors. The java speech API markup language uses to improve the output quality of the speech synthesizer.
B. Speech Recognition
This is basically used for determining the spoken language means what has been said. It converts speech into the text. It contains following steps:
1) Grammar design: It determines the words andpatterns used in the spoken words.
2) Signal processing: It analyze the frequency characteristics of the incoming audio.
3) Phoneme recognition: Comparison of patterns between spectrums and phonemes.
4) Word recognition: It compares the phonemes in respect of words specified by active grammar.
5) Result generation: It provides the result of information about the words detected in incoming audio.
Grammar is one of the important part of speech reorganization because they contain the recognition process. These constraints provide more accuracy and more speedy.
Two basic grammar types supported by java speech API are rule grammars and dictation grammars. These both types are different from each other in various ways how result is provided, types of sentences it allow, set up of grammar in the applications, the amount of computational resources required and how the application design is used. JSAPI uses java speech grammar format for defining rule grammar. Speech API are grouped by different packages. These packages contain class and interfaces.
The main three packages are:
javax.speech: For generic speech engine contains classes and interfaces.
javax.speech.synthesis: For speech synthesis contains classes and interfaces.
javax.speech.recognition: For speech recognition contains classes and interfaces.
Algorithm:
1: User give input to the web application through mice
2: This audio input is given to the speech recognition engine
3: The speech recognition engine is to translate the speech into text
4:The text input given to the hindi or English text processor which process the English or hindi words
5: The text input given to the speech synthesizer
6: The speech synthensizer converts the text into the voice
Advantages:
• For improving the mental process of speech of people
• This system contains many distinctive features like integrated speech reorganization as well as audio visually enriched boundary, which provides user better pronunciation activities and recognize the word.
• It improves students speaking and listening comprehension skills by the help this application.
• People with disabilities to help accomplish tasks that they cannot accomplish otherwise or could not do easily.
Disadvantages
• The people who have the problemto talk, for those this application will not work.
• This application works on certain domain called grammer.
• It needs extra hardware e.g.mice and speaker.