10-05-2013, 01:01 PM
SPEECH TO TEXT CONVERSION FOR SECURITY SYSTEMS
SPEECH TO TEXT.docx (Size: 19.09 KB / Downloads: 18)
ABSTRACT
Through my 2012 summer project on 'Lattice based keyword spotting in speech' (DRDO, Centre for artificial intelligence and robotics, Bangalore), I have got interest in Speech Recognition and would like to work in this area.
I would like to implement 'Speech to text conversion for security systems'. Reason for my interest in this experiment: By using this speech-to-text conversion technique, we can find out if our desired words are spoken or not(i.e. present or not in the speech files). This can help us know if the spoken data contains threat messages or not. Directly listening to speech is a bit cumbersome. After converting speech to text, we can apply search algorithms for text to locate particular words of our interest. Thus, Speech recognition has got applications in security system. I have seen some of its applications in Voice mail Retrieval.
Following are the steps in which the experiment could be carried out which I have deduced after extensive literature survey from different sources: 1) Human voice would be first converted to digital format using ADC technique. 2) The digital form so generated would be converted to vectors using matlab. 3) A vocabulary is defined in HMM (Hidden Marlov Model) tool. 4) HMM would be trained for the vocabulary. 5) The vectors we formed by MATLAB would be fed to HMM tool. It would now compare them its vocabulary. 6) The resulting set of words obtained after comparison would be decoded to generate a text file 7) We apply search algorithms written in MATLAB to search for threat messages in the resulting text file. For the above experiment,BEEP dictionary for phonetic vocabulary (which contains 46 phonemes) would be best. We can also use word dictionary. By using phonetic vocabulary, infinite possible words can be formed. By using word vocabulary, our dictionary would be limited to only few words and our HMM trained model cannot be used to identify any random text. If the words present in the speech are not in dictionary, the trainer would fail to identify the word.