27-02-2013, 09:20 AM
Voice Recognition car with webcam
Voice Recognition.docx (Size: 99.5 KB / Downloads: 40)
ABSTRACT
To create a car controlled by voice of humans is a innovative concept. In this paper we use the concept of speech recognition algorithm and algorithms that will worn on for the command of the users. The switching concept is used initially,
the remote is provided with the button, when that button is pressed after that the speech recognition process starts. Then after user will command for opening window , the speech recognition system will process accordingly and the respective window will open. Accordingly the other commands will be processed.
INTRODUCTION
In this paper we introduced a new concept of voice recognition in car which uses the concept of speech recognition algorithm. The electrical and mechanical domains are used. The digital image processing is also used. Voice recognition is coming to remote control and car navigation system .The user will command through microphone installed in the remote control of car. The signal are commanded in analogue form which needs to be converted into digital form. The car is installed with the large database which consist of vocabulary[1] , that compose of all keywords used for commanding the car. The system is installed with fully computer system, the size of a voice-recognition program's effective vocabulary is directly related to the random access memory capacity of the computer in which it is installed [2]. The car is installed with special hardware that is display, which display the all the available commands and the instructions to the users to make the system user friendly .If users will input the incorrect commands the display will generate error message and provide the most related commands to the user available in the system vocabulary and keywords on display to the users. Automatic Speech Recognition (ASR) is a model of voice recognition designed for dictation .This model is installed in the car for dictation. our concept is totally based on the concept on artificial intelligence and robotics. The paper is organised as- section 2 describes general information about the system, section 3 describes how the system works, section 4 describes how speech recognition works, in section 5 process of transformation of pcm digital audio is presented, section 6 describes spoken phenomenon, in section 7 we describes how to reduce computation and increase accuracy. Section 8 presents context free grammers. Section 9 and 10 describes continuous dictation and adaptation respectively. Atlast in section 11 the conclusion is given.
GENERAL INFORMATION
When used in conjunction with the Multi Function Steering Wheel (available on many recent models), you can also operate all principal functions and accessories. You can access phone functions, including recalling stored numbers and dialing, operate Navigation System functions, or take notes through the built-in memo function. Currently, the size of the non-speaker-dependent vocabulary includes around 30 words, including numbers and commands. Spoken sequences of commands of up to five words and columns of numbers can be recognized with a high degree of accuracy[3]. You can create a telephone book with up to 40 numbers.Dialing is then simply a matter of speaking a name. Other normal telephone functions, such as repeat dialing and call hang-up are also voice activated[4].
HOW IT WORKS
Voice recognition uses a neural net to "learn" to recognize your voice. As you speak, the voice recognition software remembers the way you say each word. This customization allows voice recognition, even though everyone speaks with varying accents and inflection. The voice commands you use in your car are chosen from a fixed vocabulary and are passed on to the car telephone or navigation system via the telephone interface. The system gives acoustic feedback on everything recognized The system requires no lengthy voice recognition protocol and responds to a simple series of set voice commands that are not sensitive to the accent or dialect of the speaker. The voice control is a finite speech dialog system, which follows a predefined structure. Faulty operation or error recognition caneasily be corrected by simply repeating the desired command. The voice recognizer is resistant to stationary environmental noise.
TRANSFORM THE PCM DIGITAL
AUDIO
The first element of the pipeline converts digital audio coming from the sound card into a format that’s more representative of what a person hears. The digital audio is a stream of amplitudes, sampled at about 16,000 times per second. If you visualize the incoming data, it looks just like the output of an oscilloscope. It’s a wavy line that periodically repeats while the user is speaking.[5] While in this form, the data isn’t useful to speech recognition because it’s too difficult to identify any patterns that correlate to what was actually said. To make pattern recognition easier, the PCM digital audio is transformed into the "frequency domain." Transformations are done using a windowed fast-Fourier transform.[6] The output is similar to what a spectrograph produces. In frequency domain, you can identify the frequency components of a sound. From the frequency components, it’s possible to approximate how the human ear perceives the sound.The fast Fourier transform analyzes every 1/100th of a second and converts the audio data into the frequency domain. Each 1/100th of a second results is a graph of the amplitudes of frequency components, describing the sound heard for that 1/100th of a second. The speech recognizer has a database ofseveral thousand such graphs (called a codebook) that identify different types of sounds the human voice can make. The sound is "identified" by matching it to its closest entry in the codebook, producing a number that describes the sound. This number is called the "feature number." (Actually, there are several feature numbers generated for every 1/100 the of a second but the process is easier to explain assuming only one.) The input to the speech recognizer began as a stream of 16,000 PCM values per second. By using fast Fourier transforms and the codebook, it is boiled down into essential information,producing 100 feature numbers per second.
SPOKEN PHENOMENAS
I’m going to jump out of order here. To make the recognition process easier to understand, I’ll first explain how the recognizer determines what phonemes were spoken and then explain the grammars.
In an ideal world, you could match each feature number to a phoneme. If a segment of audio resulted in feature #52, it
could always mean that the user made an "h" sound. Feature #53 might be an "f" sound, etc. If this were true, it would beeasy to figure out what phonemes the user spoke. Unfortunately, this doesn’t work because of a number of reasons:
Every time a user speaks a word it sounds different. Users do not produce exactly the same sound for the same phoneme.The background noise from the microphone and user’s office sometimes causes the recognizer tohear a different vector than it would have if the user was in a quiet room with a high quality microphone.The sound of a phoneme changes depending on whatphonemes surround it. The "t" in "talk" soundsdifferent than the "t" in "attack" and "mist".The sound produced by a phoneme changes from the beginning to the end of the phoneme, and is not constant. The beginning of a "t" will produce different feature numbers than the end of a "t".The background noise and variability problems are solved by allowing a feature number to be used by more than just one phoneme, and using statistical models to figure out which phoneme is spoken. This can be done because a phoneme lasts for a relatively long time, 50 to 100 feature numbers, and it’s likely that one or more sounds are predominant during that time. Hence, it’s possible to predict what phoneme was spoken. Actually, the approximation is a bit more complex than this.
I’ll explain by starting at the origin of the process.[7] For the speech recognition to learn how a phoneme sounds, a training tool is passed hundreds of recordings of the phoneme. It analyzes each 1/100 th of a second of these hundreds of recordings and produces a feature number. From these it learns statistics about how likely it is for a particular feature number to appear in a specific phoneme. Hence, for the phoneme "h", there might be a 55% chance of feature #52 appearing in any 1/100 th of a second, 30% chance of feature #189 showing up, and 15% chance of feature #53. Every 1/100 th of a second ofan "f" sound might have a 10% chance of feature #52, 10% chance of feature #189, and 80% chance of feature #53[8].
CONCLUSION
This was a high level overview of how speech recognitionworking in the cars. To use the voice concept is very complex process in automobiles because some applications are more complex to install and use .one can easily open the windows by using the concept of voice recognition and close as well. The other applications possible are , controlling the music system, commanding over the power windows ,steering locking .The voice recognition concept is very much innovative and sensitive concept in the field of automobiles and iti can be made more secure using the concept of finger print analysis process..