[
attachment=11251]
Artificial intelligence for Speech Recognition
I. INTRODUCTION
• AI is the study of the abilities for computers to perform tasks, which currently are better done by humans.
• AI has an interdisciplinary field where computer science intersects with philosophy, psychology, engineering and other fields.
• The essence of AI in the integration of computer to mimic this learning process is known as Artificial Intelligence Integration.
THE TECHNOLOGY
• Artificial intelligence (AI) involves two basic ideas.
First, it involves studying the thought processes of human beings
Second, it deals with representing those processes via machines (like computers, robots, etc).
• AI is behavior of a machine, which, if performed by a human being, would be called intelligence. It makes machines smarter and more useful, and is less expensive than natural intelligence.
• Speech recognition allows you to provide input to an application with your voice.
• The speech recognition process is performed by a software component known as the speech recognition engine.
• The primary function of the speech recognition engine is to process spoken input and translate it into text that an application understands.
SPEECH RECOGNITION
• The user speaks to the computer through a microphone, which in turn, identifies the meaning of the words and sends it to NLP device for further processing.
• Once recognized, the words can be used in a variety of applications like display, robotics, commands to computers, and dictation.
• The word recognizer is a speech recognition system that identifies individual words.
• Continuous speech recognizers are far more difficult to build than word recognizers.
• You speak complete sentences to the computer. The input will be recognized and, then processed by NLP.
• Such recognizers employ sophisticated, complex techniques to deal with continuous speech, because when one speaks continuously, most of the words slur together and it is difficult for the system to know where one word ends and the other begins.
SPEECH RECOGNITION PROCESS
What is a speech recognition system?
• A speech recognition system is a type of software that allows the user to have their spoken words converted into written text in a computer application such as a word processor or spreadsheet.
• The computer can also be controlled by the use of spoken commands.
• Speech recognition software can be installed on a personal computer of appropriate specification.
• The user speaks into a microphone.
• After the training process, the user’s spoken words will produce text; the accuracy of this will improve with further dictation and conscientious use of the correction procedure.
• With a well-trained system, around 95% of the words spoken could be correctly interpreted.
• The system can be trained to identify certain words and phrases and examine the user’s standard documents in order to develop an accurate voice file for the individual.
Terms and Concepts
• Utterances
When the user says something, this is known as an utterance. An utterance is any stream of speech between two periods of silence. Utterances are sent to the speech engine to be processed.
• Pronunciations
The speech recognition engine uses all sorts of data, statistical models, and algorithms to convert spoken input into text. One piece of information that the speech recognition engine uses to process a word is its pronunciation, which represents what the speech engine thinks a word should sound like.
• Grammars
A grammar can be as simple as a list of words, or it can be flexible enough to allow such variability in what can be said that it approaches natural language capability.
• Accuracy
It is typically a quantitative measurement and can be calculated in several ways. Arguably the most important measurement of accuracy is whether the desired end result occurred.
SPEAKER INDEPENDENCY
• The speech quality varies from person to person.
• Speaker-independent system can be used by anybody, and can recognize any voice, even though the characteristics vary widely from one speaker to another.
• Most of these systems are costly and complex. Also, these have very limited vocabularies.
Speaker Dependence Vs Speaker Independence
• Speaker Dependence describes the degree to which a speech recognition system requires knowledge of a speaker’s individual voice characteristics to successfully process speech.
• Speech recognition systems that do not require a user to train the system are known as speaker-independent systems.
• Speech recognition in the VoiceXML world must be speaker-independent.
WORKING OF THE SYSTEM
• The voice input to the microphone produces an analogue speech signal.
• An analogue to digital converter (ADC) converts this speech signal into binary words that are compatible with digital computer.
• The converted binary version is then stored in the system and compared with previously stored binary representation of words and phrases.
Speaker- Dependent Word Recognizer
What software is available?
• New and improved versions are regularly produced, and older versions are often sold at greatly reduced prices.
• Discrete speech software is an older technology that requires the user to speak one – word – at – a – time.
• Dragon Dictate Classic Version 3 is one example of discrete speech software, as it has fewer features, is simple to train and use and will work on Continuous speech software allows the user to dictate normally.
Limitations to This Type of Software
ü It needs to be completely tailored to the user and trained by the user.
ü It is often set up on one machine, and so can create difficulties for a user who works from many locations, for example from school and home.
ü It depends on the user having the desire to produce text and be able to invest the time, training and perseverance necessary to achieve it.
ü It is most successful for those competent in the art of dictation.
APPLICATION
• One of the main benefits of speech recognition system is that it lets user do other works simultaneously. The user can concentrate on observation and manual operations, and still control the machinery by voice input commands.
• Another major application of speech processing is in military operations. Voice control of weapons is an example.
• Voice recognition could also be used on computers for making airline and hotel reservations.
CONCLUSION
• It is important to consider the environment in which the speech system has to work. The grammar used by the speaker and accepted by the system, noise level, noise type, position of the microphone, and speed and manner of the user’s speech are some factors that may affect the quality of speech recognition.
• Since, most recognition systems are speaker independent, it is necessary to train a system to recognize the dialect of each user.