07-11-2016, 09:52 AM
1465740942-techpaper.docx (Size: 1.18 MB / Downloads: 10)
Robotics is the branch of different engineering field that deals with design, construction, operation and application of robots. Its main power source is battery. All robots contain some level of computer programming code. A program is how a robot decides when or how to do something.
Robots have different useful task to communicate with human beings. Robotic systems have ability to perform operation based on speech and text commands. Most of the human communication is performed by non-verbal in nature. There is a possibility that they interact with human being based on gestures. As robots interact closely with users, it is important to find natural user interface for interacting with the robots in various situations .
The main objective of this paper is to control the robotic vehicle in a desired position, remotely through user voice commands by attaching a speech –recognition module to the microcontroller unit and using an RF communication. Here we considered only the basic motion of robot.
In this paper, we aimed to make the human robot interaction through verbal communication; that is speech .Speech recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to machine –readable format. Speech is an ideal method for robotic control and communication. The speech recognition circuit will form the robot’s main intelligence (CPU).
The Speech is most prominent & primary mode of Communication among of human being. The communication among human computer interaction is called human computer interface. Speech has potential of being important mode of interaction with computer
The speech is primary mode of communication among human being and also the most natural and efficient form of exchanging information among human in speech. Speech Recognition can be defined as the process of converting speech signal to a sequence of words . Speech processing is one of the exciting areas of signal processing. The goal of speech recognition area is to developed technique and system to developed for speech input to machine.
2.1. SPEECH PRODUCTION IN HUMAN BEING
For speech to occur, air must be forced up out of the lungs, up the trachea, and into the vocal tract.
The components of speech production are
• the lungs
• the larynx ("voice box") containing the vocal folds and the glottis
• the vocal tract with the nasal and oral cavities
2.1.1. Lungs
Speech requires some sort of air source. We produce a majority of speech sounds by forcing air upwards from the lungs, an action that is used in normal breathing. To produce a speech, sound the outward moving airstream must be modified by manipulation of the larynx and articulators in the oral and nasal cavities. The central organs involved in the production of speech sounds include: the lungs, larynx, and vocal tract (the oral cavity, nasal cavity, and pharynx)
2.1.2. Larynx
The larynx, more commonly known as the voice box, is crucial in the production and differentiation of speech sounds. The larynx is located at exactly the point where the throat divides between the trachea (the windpipe), which leads to the lungs, and the esophagus (the tube that carries food or drink to the stomach).
There are two thin sheets of tissue that stretch in a V-shaped fashion from the front to the back of the larynx. These are called the vocal folds. The space between the vocal folds is known as the glottis. The vocal folds can be positioned in different ways to create speech sounds.
2.1.3. Vocal tract with nasal and oral cavities
When the air passes up through the vocal folds, it is expelled through the mouth (oral cavity). The tongue, lips, teeth, and various regions of the mouth constitute points of articulation in the oral cavity. In oral sounds most air is expelled via the oral cavity (mouth)
Speech recognition allows the user to perform parallel tasks, (i.e. hands and eyes are busy elsewhere) while continuing to work with the computer. Speech Controlled Robot is a mobile robot whose motions can be controlled by the user by giving specific voice commands. The speech recognition software running on a PC is capable of identifying the voice commands issued by a particular user. After processing the speech, the necessary motion instructions are given to the mobile platform via a RF link.
SPEECH RECOGNITION TECHNOLOGY
Speech Recognition is a special case of pattern recognition. There are two phase in supervised pattern recognition, viz., Training and Testing. The process of extraction of features relevant for classification is common in both phases.
During the training phase, the parameters of the classification model are estimated using a large number of class examples (Training Data). During the testing or recognition phase, the feature of test pattern (test speech data) is matched with the trained model of each and every class. The test pattern is declared to belong to that whose model matches the test pattern best.
The goal of speech recognition is for a machine to be able to "hear,” understand," and "act upon" spoken information.. The goal of automatic speaker recognition is to analyze, extract characterize and recognize information about the speaker identity. The speaker recognition system may be viewed as working in a four stages
1. Analysis
2. Feature extraction
3. Modeling
4. Matching and Testing
2.3.1. Speech analysis technique
Speech data contain different type of information that shows a speaker identity. This includes speaker specific information due to vocal tract, excitation source and behavior feature. The information about the behavior feature also embedded in signal and that can be used for speaker recognition. The speech analysis stage deals with stage with suitable frame size for segmenting speech signal for further analysis and extracting
2.3.2. Feature Extraction Technique
The speech feature extraction in a categorization problem is about reducing the dimensionality of the input vector while maintaining the discriminating power of the signal. As we know from fundamental formation of speaker identification and verification system, that the number of training and test vector needed for the classification problem grows with the dimension of the given input so we need feature extraction of speech signal.
2.3.3 Modeling Technique
The objective of modeling technique is to generate speaker models using speaker specific feature vector. The speaker modeling technique divided into two classification
1)speaker recognition
2) speaker identification.
The speaker identification technique automatically identify who is speaking on basis of individual information integrated in speech signal
The speaker recognisation is also divided into two parts that means speaker dependant and speaker independent. In the speaker independent mode of the speech recognisation, the computer should ignore the speaker specific characteristics of the speech signal and extract the intended message.
2.3.4 . Matching Techniques
Speech-recognition system match a detected word to a known word using one of the following techniques
1. Whole-word matching. The system compares the incoming digital-audio signal against a prerecorded template of the word. This technique takes much less processing than sub-word matching.
2. Sub-word matching. The system looks for sub-words - usually phonemes and then performs further pattern recognition on those. This technique takes more processing than whole-word matching, but it requires much less storage .
PROPOSED SYSTEM
In this project we develop a robotic system which centers on speech recognition.This speech-control system shows the ability to apply speech recognition techniques to the control application. The robot can understand control commands spoken in a natural way, and execute the corresponding action. The method is proved efficient enough for real-time operation. Such a mobile robot has potential for application in somewhere voice communication plays a crucial role. The speech recognition centers on recognition of speech commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel Frequency Cepstral Coefficient (MFCC) and Linear Predictive Coding (LPC) algorithm is used to recognize the speech and to extract features of speech. It uses RF transceiver for wireless communication.
Advantages of Speech controlled robot
1. Speech is a very natural way to interact, it is not necessary to work with a remote control.
2. Free-hand control.
3. Fast data input operation.
4. Provides more accuracy than Remote Controlled
5.Makes system more user friendly
Speech controlled robotic system consist of software and hardware modules. The software module act as the speech recognition system while the hardware module forms the control system. Initially the system is trained with user’s speech samples. The user’s speech command is inputted through a microphone attached to the PC. This speech command is analyzed and its features are extracted using MFCC and LPC algorithms implemented in Mat lab. These features are compared with the trained data set. The recognized command will be transmitted to the robot via RF transmitter.
On the receiving section, RF receiver will receive the command and feed it to the arduino microcontroller. A motor driver IC is used as an interface between the microcontroller and the servo motors. The microcontroller will process the signal and drive the motors attached to it to perform the desired motion.
MODULE 1 – ROBOT PROGRAMMING
We have successfully implemented our first module ‘Robot Programming’. In this module, the command, indicating the direction of motion of robot is received from PC with the help of a RF receiver at the robot side. This is obtained from serial buffer by checking whether data is available in it. The values for motors associated with the robot chassis is identified with the corresponding Arduino code for the command received. The motor rotates as per the values received from the motor driver
Arduino board and IDE: To implement this module, we are using Arduino platform. The code can be developed in Arduino IDE and it is stored in microcontroller via USB. The microcontroller we use here is ATmega328. The Arduino is an open source platform. Also it provides an IDE for developing programs, which can be uploaded to the microcontroller, which once uploaded; the Arduino board can work independently. The compiler used for Arduino is AVR GCC Compiler. The programming is very simple in Arduino.
Atmega328 microcontroller: We use microcontroller in this project instead of microprocessor because microcontroller has peripherals on board. There are slots for serial communication, I/O communication embedded into the board itself. Atmega328 is an 8-bit microcontroller with 28 pins – 20 I/O pins (14 digital pins and 6 analog pins) and 8 control pins. It also has an in-built ADC, so that no explicit conversion is required. There is an UART module that facilitates serial communication. The clock frequency is 16 MHz and the voltage required is 3.3V.
Serial Communication
The communication can be either serial or parallel. We are using serial communication for wireless communication between PC and robot in this project because interference is less in it. In serial communication, data transmission takes place bit by bit. In Atmega328 microcontroller, UART module is present for effective serial communication. The baud rate should be specified explicitly. The serial communication is usually of RS232 Standard. The RS232 Standard is to be converted to TTL Standard due to voltage constraints. In Atmega328, conversion takes place implicitly. InArduinoIDE, serial monitor is available to view serial communication. Functions used here are:
Serial.begin(baudrate); //To set baud rate
Serial.print(“ “); // To print a value
x=Serial.read(); // To read a value from serial buffer where x is of type char
Serial.available() // To check whether data is available in serial buffer
5.1.4.Motor Interfacing with Atmega328
To control the motion of a wheeled robot, DC motors can be interfaced to Atmega328 microcontroller. The motor can be rotated in forward or reverse directions. This motor forms the basic motion of the robot.
When a motor is connected directly to microcontroller, the I/O pin to motor provides only 40Ma current. To drive the motor, a minimum of 100Ma current is required. So to connect the motor to microcontroller, a driving circuit is connected as an intermediate circuit. In our project, we have used the motor driver IC – L293D.
Wireless Communication
The speech command, when identified, has to give a command to the robot. So, a serial communication can be used in wireless mode.
RF module is used to transmit and receive radio signals between two devices wirelessly. RF communication incorporates a transmitter and receiver. Digital data is represented as variations in amplitude of carrier wave . It does not require line of sight..Its operating frequency is 434MHz..Transmission occurs at a rate of 1kbps to 10kbps.
A RF transmitter-receiver pair, collectively called as an RF module is used for wireless communication up to a distance of 10m. The transmitter can be connected to the PC being used for data (command) transmission. The receiver is connected to Arduino board that receives command from the PC (laptop) through the serial buffer and helps the robot act according to the speech command.
Playback Module
Playback module is a non-volatile voice storage. It has upto five segments for record and play option. It also provides jumper selectable mode for recording. It has 3 Switches for record, play, reset options..It has a onboard microphone for recording. Play Module contains audio output available to drive a speaker.
5.1.7.IR Obstacle Sensor
IR obstacle sensor is analogous to human visionary senses. It is used to detect obstacles.Transmitter transmits an IR signal, this IR signal bounces from the surface of the object and the signal is received at the IR receiver .It can detect obstacles within 50cms of range from the sensor.
.Implementation of the module
Here, a command based rotation is performed at the robot. Through the wireless serial communication commands like ‘ forward’, ‘left’, ‘right’, ‘stop’ and ‘backward’ obtained at the receiver buffer which is interpreted to find the corresponding code depicting the motor driver’s input values and thus, the direction of motor.
5.2.MODULE 2-SPEECH RECOGNITION
In this project, the algorithm for the speech recognition has been developed and implemented on MATLAB.Speech recognition centres on recognition of speech commands stored in data base of Matlab and it is matched with incoming voice command of speaker.LPC and MFCC algorithms are used to extract the features of speech.
Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation Linear predictive coding (LPC) is a method for signal source modelling in speech signal processing. It is often used by linguists as a formant extraction tool. It has wide application in other areas. LPC analysis is usually most appropriate for modeling vowels which are periodic, except nasalized vowels. LPC is based on the source-filter model of speech signal.
For Speech recognitions, the most commonly used acoustic feature is Mel-Scale Frequency Cepstral Coefficient (MFCC). This technique is often used to create the fingerprint of the sound files. The MFCC are based on the known variation of the human ear’s critical bandwidth frequencies with filters spaced linearly at low frequencies and logarithmically at high frequencies used to capture the important characteristics of speech. Studies have shown that human perception of the frequency contents of sounds for speech signals does not follow a linear scale. Thus for each tone with an actual frequency (f) measured in Hz, a subjective pitch is measured on a scale called the Mel scale. The Mel-frequency scale is linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz.
Steps involved in MFCC Algorithm
1.Pre–emphasis: This step processes the passing of signal through a filter which emphasizes higher frequencies. This process will increase the energy of signal at higher frequency.
2.Framing: The process of segmenting the speech samples obtained from analog to digital conversion (ADC) into a small frame with the length within the range of 20 to 40 msec. The voice signal is divided into frames of N samples.
3.Windowing: In order to reduce the discontinuities of the speech signal at edges of each frame, a tapered window is applied to it. Commonly used window is hamming window.
4.Fast Fourier Transform: To convert each frame of N samples from time domain into frequency domain.
5.Mel Filter Bank Processing: The range of frequencies in FFT is very wide and voice signal does not follow the linear scale. The bank of filters according to Mel scale is used to smoothen scale.
6.Cosine Transform: This is the process just reverse to FFT to convert the log Mel spectrum into time domain using Discrete Cosine Transform (DCT). The result of the Discreteconversion is called Mel Frequency Cepstrum Coefficient.
Features of speech are extracted using MFCC & LPC. These features are compared with the database and features with minimum error is considered .The recognized command will be transmitted over RF module. Finally the robot will perform appropriate actions.
5.2.1. Matlab
MATLAB (matrix laboratory) is a multi-paradigm numerical computing environment and fourth-generation programming language. A proprietary programming language developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++,Java, Fortran a
RESULTS AND DISCUSSIONS
We have successfully completed our two modules - ‘Robot Programming’ and ‘Speech Recognition module’. Robot is controlled using Users 5 voice commands such as LEFT,RIGHT,FORWARD,BACKWARD and STOP. User gives voice commands through microphone and it is processed in speech recognition module .The processed command is send as signal through RF Transmitter and it is received by RF receiver incorporated in robot body. Robot takes corresponding movements according to the given command.