Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: REPORT ON VOICE CONTROLLED ROBOT PPT
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
REPORT ON VOICE CONTROLLED ROBOT

[attachment=32200]

INTRODUCTION

When we say voice control, the first term to be considered is Speech Recognition i.e.
making the system to understand human voice. Speech recognition is a technology where
the system understands the words (not its meaning) given through speech. Speech is an ideal method for robotic control and communication. The speechrecognition
circuit we will outline, functions independently from the robot’s main
intelligence [central processing unit (CPU)]. This is a good thing because it doesn’t take
any of the robot’s main CPU processing power for word recognition. The CPU must
merely poll the speech circuit’s recognition lines occasionally to check if a command has
been issued to the robot. We can even improve upon this by connecting the recognition
line to one of the robot’s CPU interrupt lines. By doing this, a recognized word would
cause an interrupt, letting the CPU know a recognized word had been spoken. The
advantage of using an interrupt is that polling the circuit’s recognition line occasionally
would no longer be necessary, further reducing any CPU overhead.

Why build robots?

Robots are indispensable in many manufacturing industries. The reason is that the cost
per hour to operate a robot is a fraction of the cost of the human labor needed to perform
the same function. More than this, once programmed, robots repeatedly perform
functions with a high accuracy that surpasses that of the most experienced human
operator. Human operators are, however, far more versatile. Humans can switch job tasks
easily. Robots are built and programmed to be job specific. You wouldn’t be able to
program a welding robot to start counting parts in a bin. Today’s most advanced
industrial robots will soon become “dinosaurs.” Robots are in the infancy stage of their
evolution. As robots evolve, they will become more versatile, emulating the human
capacity and ability to switch job tasks easily. While the personal computer has made an
indelible mark on society, the personal robot hasn’t made an appearance. Obviously
there’s more to a personal robot than a personal computer. Robots require a combination
of elements to be effective: sophistication of intelligence, movement, mobility,
navigation, and purpose.

Approaches of Statistical Speech Recognition

Hidden Markov model (HMM)-based speech recognition

Modern general-purpose speech recognition systems are generally based on hidden
Markov models (HMMs). This is a statistical model which outputs a sequence of symbols
or quantities.
One possible reason why HMMs are used in speech recognition is that a speech signal
could be viewed as a piece-wise stationary signal or a short-time stationary signal. That
is, one could assume in a short-time in the range of 10 milliseconds, speech could be
approximated as a stationary process. Speech could thus be thought as a Markov model
for many stochastic processes (known as states).
Another reason why HMMs are popular is because they can be trained automatically and
are simple and computationally feasible to use. In speech recognition, to give the very
simplest setup possible, the hidden Markov model would output a sequence of ndimensional
real-valued vectors with n around, say, 13, outputting one of these every 10
milliseconds. The vectors, again in the very simplest case, would consist of cepstral
coefficients, which are obtained by taking a Fourier transform of a short-time window of
speech and de-correlating the spectrum using a cosine transform, then taking the first
(most significant) coefficients. The hidden Markov model will tend to have, in each state,
a statistical distribution called a mixture of diagonal covariance Gaussians which will
give likelihood for each observed vector. Each word, or (for more general speech
recognition systems), each phoneme, will have a different output distribution; a hidden
Markov model for a sequence of words or phonemes is made by concatenating the
individual trained hidden Markov models for the separate words and phonemes.

Neural network-based speech recognition

Another approach in acoustic modeling is the use of neural networks. They are capable of
solving much more complicated recognition tasks, but do not scale as well as HMMs
when it comes to large vocabularies. Rather than being used in general-purpose speech
recognition applications they can handle low quality, noisy data and speaker
independence. Such systems can achieve greater accuracy than HMM based systems, as
long as there is training data and the vocabulary is limited. A more general approach
using neural networks is phoneme recognition. This is an active field of research, but
generally the results are better than for HMMs. There are also NN-HMM hybrid systems
that use the neural network part for phoneme recognition and the hidden Markov model
part for language modeling.

Dynamic time warping (DTW)-based speech recognition

Dynamic time warping is an algorithm for measuring similarity between two sequences
which may vary in time or speed. For instance, similarities in walking patterns would be
detected, even if in one video the person was walking slowly and if in another they were
walking more quickly, or even if there were accelerations and decelerations during the
course of one observation. DTW has been applied to video, audio, and graphics -- indeed,
any data which can be turned into a linear representation can be analyzed with DTW.
A well known application has been automatic speech recognition, to cope with different
speaking speeds. In general, it is a method that allows a computer to find an optimal
match between two given sequences (e.g. time series) with certain restrictions, i.e. the
sequences are "warped" non-linearly to match each other. This sequence alignment
method is often used in the context of hidden Markov models.