Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: GAJENDRA N RESUME
You're currently viewing a stripped down version of our content. View the full version with proper formatting.

seminar code

Software Requirements Specification


[attachment=66536]

Introduction


During the past few years, the computer industry has seen speech technologies arrive with a flash and quietly fold without a whimper. This speech processing area has become a feature which is under levels of researches. Speech recognition systems have been around for over twenty years, but the early systems were very expensive and required powerful computers to run. During recent years, manufacturers have reduced the prices of the speech recognition systems. The technology behind speech output has also changed. Early systems used discrete speech, i.e. the user had to speak one word at a time, with a short pause between words. Over the past few years most systems have used continuous speech, allowing the user to speak in a more natural way. The main commercial continuous speech systems currently available for the PC are Dragon NaturallySpeaking and IBM ViaVoice. With the advancement of technology today many research projects have progressed with developing applications that use speech technology to boost application user‟s experience. As pioneers IBM, Microsoft and Sun Microsystems have performed considerable amount of work on the context. Microsoft boasted the ability to control Windows by voice commands and have embedded the capability of dictating such like Microsoft Word application. With kind of those promising achievements when it‟s coming to present day apart from the Microsoft and Sun Microsystems some other developers also have released specific speech APIs in order to help application developers.


Abstract


Voice based web browser is a solution basically for recognizing and interpreting voice. System will be capable of converting an acoustic signal, captured by a microphone to a set of words. Emphasis is that this acoustic wave represents a human speech done. The recognized words which are the results can be used for applications as commands, data entries or could be served as the input to further linguistic processing in order to achieve done in language and produce an output that can be used for other applications extensively. System consists of two main component such that one component is meant for processing acoustic signal captured by the microphone while other component is meant to interpret the processed signal and map the signal to words. speech understanding. Hence the end product will be capable of tracking the human speech.


Background


There are two main speech technology concepts in this scenario as speech synthesis and speech recognition. Speech synthesis is the artificial production of human speech. This project is based on speech recognition where the intention is to develop a system which is capable of converting human speech to text/ command. Speech recognition systems can be characterized by many parameters, such as speaking model, speaking style, vocabulary. An isolated-word speech recognition system requires that the speaker pause briefly between words, whereas a continuous speech recognition system does not. Spontaneous, or extemporaneously generated, speech contains disfluencies, and is much more difficult to recognize than speech read from script. Some systems require speaker enrollment, a user must provide samples of his or her speech before using them, whereas other systems are said to be speaker-independent, in that no enrollment is necessary. A speaker independent system is developed to operate for any speaker of a particular type (e.g., American English), whereas a speaker adaptive system is developed to adapt its operation to the characteristics of new speakers. Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large or have many similar-sounding words. When speech is produced in a sequence of words, language models or artificial grammars are used to restrict the combination of words.


Problem/Requirement

Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding, a subject covered in section. Speech recognition appears as an alternative to typing on a keyboard. Simply, user talk to the computer and computer grab the user‟s utterance. These softwares have been developed to provide a fast method to communicate with a computer and can help people with a variety of disabilities. It is useful for people with physical disabilities who often find typing difficult, painful or impossible. Voice recognition software can also help those with spelling difficulties, including users with dyslexic, because recognized words are always correctly spelled.


DYNAMIC TIME WARPING (DTW)-BASED SPEECH RECOGNITION


Dynamic time warping is an algorithm for measuring similarity between two sequences, which may vary in time or speed. The basic principle of DTW is to allow a range of 'steps' in the space of (time frames in sample, time frames in template) and to find the path through that space that maximizes the local match between the aligned time frames, subject to the constraints implicit in the allowable steps. The total `similarity cost' found by this algorithm is a good indication of how well the sample and template match, which can be used to choose the best-matching template. Any data that can be turned into a linear representation can be analyzed with DTW.


CONDITIONAL RANDOM FIELD (CRF)

CRF is a framework for building probabilistic models to segment and label sequence data. This approach provides several advantages over HMMs such as ability to relax strong independence assumptions made in HMM based models. Maximum entropy Markov model and discriminative Markov models are based on directed graphical based models and hence can be biased towards states within few successor levels. CRF avoid these confines as well .


Sound recording and Word detection Component

This component takes the input from the audio recorder, preferably microphone, and identies the words in the input signal. Word detection is usually done by using the energy and the zero crossing rate of the signal. The output of this component is then sent to the feature extractor module.


Conclusion

In my opinion, the software does an admirable job of addressing the usability needs of a wide audience. It is necessary to insure designers develop software products with universal usability in mind. Designers created a generally solid software product that almost anyone can use with success. The software recorded and analyzed each test subject's voice successfully. Afterwards, each user could dictate and the PC transcribed the user‟s dictation with relative accuracy. The voice to text transcription application is a proven feature of Fonix Embedded Speech DDK. However, using this software application as a communication device for automobile is yet unproven, at least on my opinion. First, I have not figured out a way to input external files into a program that needs storage for its data. As an example, in Samples Four and Five, the programs need to have places to store the origins and destinations input by the users so that the program can use it later on for the direction. Because of its lacking of features, I had to be creative and work my way around it to find another alternative to make the Samples work the way I wanted them to. Maybe I have been unable to figure out the best features of the software product yet and the features that I had looked are still hidden within the package somewhere. The built-in microphone on my notebook did not work well with the software product in noisy environments. The product needed assistance from the Array Microphone made by Andreaelectronic Company. Despite the size of the Array Microphone from Andreaelectronic Company, its performance is well beyond perfect. It could eliminate noise to minimal levels and helps a lot with the software product, especially when users want to use it in a very noisy environment such as a train station, freeway, nearby hospital, or busy commuter roads. A suggestion for further research is to select users who actually would like to use Speech Recognition for their needs. This way, researchers could get a more accurate result list to make a better device for these people. Because the testers in this document did not have any intention to use Speech Recognition, at least for time being, for any of their purposes, the results of their opinions on the software product were very general and board. This would make a lot of improvement later if they decide to build a device based on the results of this document right away.