05-09-2016, 03:19 PM
1452949793-TEseminar.docx (Size: 397.17 KB / Downloads: 3)
Abstract
As a means for more visceral and e cient human computer interaction by a diver-si ed range of application, virtual environment have always been considered. Analysis of complex scienti c data, medical training, military simulation, phobia therapy and virtual prototyping included the spectrum of applications. Current user interaction approaches with keyboard, mouse and pen are not su cient for the still widening spectrum of Human computer interaction, evolution of ubiquitous computing. Constraining and uncomfort-able to use, Gloves and sensor based trackers are unwieldy. Due to the limitation of these devices the useable command set based diligences is also limited.
For providing natural Human Computer Interaction which has its inheritance from text based interfaces through 2D graphical-based interfaces, multimedia-supported interfaces, to full- edged multi-participant Virtual Environment (VE) systems, Direct use of hands as an input device is an innovative method. All without help of any input device, conceiving a future era of human-computer interaction with the implementations of 3D application where the user may be able to move and rotate objects simply by moving and rotating his hand. A low cost interface device for interacting with objects in virtual environment using hand gestures, the research e ort centralizes on the e orts of implementing an application that employs computer vision algorithms and gesture recognition techniques which in turn results in developing.
Introduction
Gesture recognition has gained a lot of importance since few years. Various appli-cations can be controlled using gestures. Hand gestures are used in various applications like gaming, mouse control etc. In an application like robot control using hand gestures, the robot responds to hand gestures given by the human. This hand sign of humans is visually observed by robot through a camera. The algorithm that enables the robot to identify the hand gesture from the image is of interest. Each gesture corresponds to a particular command. The command that is identi ed will be used to control the robot to perform certain action or to execute a certain task. Di erent gestures will have di erent meaning associated with them. For example, count one could mean stop, two for move forward, three, four and ve for turn right, turn left and reverse respectively.
The hand gesture recognition system makes use of gloves, markers etc. Though the use of gestures increases the interactivity between man and machine, use of such gloves or markers increases the cost of the system. Some applications require the use of two cameras to obtain a 3D view of hand and from this hand gesture is recognized. Two types of hand gestures are used, static and dynamic. Static gestures make gestures by keeping the hand stable. For example, by using the nger without moving the hand, the system would perform the speci ed function.
Dynamic gestures are those that involve the movement of the hand. Like in media player that is controlled by hand gestures, moving the hand to the right side may indicate increasing the volume. For some application hand gesture recognition system may require to store images in the database. Execution of these applications may require
the use of complex algorithm to compare images already stored in the database with that of images taken from the camera and then perform necessary tasks. For such appli-cations gestures should be known prior the use as they are already stored in the database.
So here in our approach of Hand Gesture Recognition System to control the devices using Distance Transform method which does not require storage of images in the database. The system uses both static and dynamic gestures for appliance control. In this system the images are captured through the webcam. It is segmented to recognize the hand region. A skin color detection algorithm is used for hand region detection. The binary image that is generated is given to the distance transform method that calculates the centroid of the hand and using this it calculates the number of active ngers or the motion of the hand. Accordingly the physical device is controlled.
1.1 Objectives of the Project
The system uses dynamic gestures for appliance control. In this system the images are captured through the webcam. It is segmented to recognize the hand region. A skin color detection algorithm is used for hand region detection. Then binary image will get generated and subsequently the motion of the hand is calculated. Therefore, the aim and objective of this system is to develop such a system which will help human to control various physical devices simply using dynamic hand gestures.
Literature Survey
Many previous works are done in this eld by di erent researchers. There are many approaches that were followed by di erent researchers like vision based , data glove based, arti cial neural network, fuzzy logic, genetic algorithm, hidden markov model, support vector machines etc. Some of the previous works are given below.
In Principal component analysis (PCA)[1] was implemented in their work. Lamar[2], used PCA for extracting features from the input image in which mean, covari-ance, Eigen values and Eigen vector were found out. Mean describing the position of nger, Eigen value describing the shape of the nger, Eigen vector showing the direction of the image were the features used. Chang[3], in this paper used PCA as a classi er where Eigen face was extracted from the image to be tested and then the Euclidean distance was found between classes.
In Arti cial Neural Network ANN[4] based system was proposed for recogniz-ing the gesture. It was used because of advantageous factors like generality, adaptive learning and real time operation. Kortum used ANN to recognize American sign lan-guage. Firstly, the skin coloured regions were extracted after which the moment in variant was obtained. ANN was used were the network has 58 neurons of which 34 were input neurons, 20 hidden neurons and 4 output neurons and the data set comprised of 270 feature vectors. The network after trained by an input vector gives a single output neuron giving the desired recognition.
Fuzzy logic is a problem solving approach based on degrees of truth rather than the usual true or false that is 1 or 0. It includes 0 and 1 as extreme cases of truth and also includes the various states of truth in between. For example intelligence cannot be measured with normal 1 or 0. It has to be compared with other intelligence and result can be 0,1 or in between. In Fuzzy Logic[5] was implemented for hand gesture recogni-tion. In this paper Verma states that the motion of the hand can be detected by nite state machine states. These states are assumed as clusters which are indeed formed by fuzzy c-means clustering. Then the centroid of each clusters found out mathematically and hence state of FSM was determined and nally gestures was recognized.
K-L Transform[6] is used to translate and rotate and axes and new coordi-nate is established according to the variance of the data. The K-L Transformation is also known as the principal component transformation, Eigen vector transformation or the hoteling transformation. The advantages are that it eliminates the co-related data, reduces dimension keeping average square error minimum and gives good cluster char-acteristics. K-L Transform gives very good energy compression. It establishes a new coordinate system whose origin will be at the center of the object and the axis of the new coordinate system will be parallel to the direction of the eigen vectors. It is often used to remove random noise.
Project Requirements
The system will include low resolution web cam for capturing the hand gestures and an algorithm that processes the acquired images and then classi es the hand gesture correctly. The work mainly emphasizes on the feature extraction from the hand gestures and use that features in the recognition algorithms. Initially, the system will contain a setup procedure, in which, the algorithm is trained on given training set, based on signi cant feature extracted for di erent hand gestures. Once the setup in completed successfully, the system will be able to classify the given hand gesture based on the knowledge acquired during the training phase.The design of hand gesture recognition system is broadly divided into two phase. The rst phase is the preprocessing phase and the second phase is the classi cation phase.
Required Hardware & Software:
Software: Windows XP/7 onwards, .Net framework 3.5 or 4.0, C sharp pro-
gramming
Hardware: PC with following con guration: Pentium Processor IV or Higher, minimum 10GB, 512MB or Higher, 2.66GHz or Fster processor, microsoft kinect device.640480 pixels
Image Acquisition:
The user makes gestures by positioning hand parallel to webcam. Images are con-tinuously captured and then given as input for further processing of gesture recog-nition.
Image Processing:
Image processing consists of mainly six steps and that are frame extraction, l-tering, colour thresholding, template generation, template matching and action mapping.
Frame Extraction:
The main motivation for extracting the content of information is the accessibil-ity problem. It compare and calculate similarity of each video frame to consider whether there is a change in scenery of not. The basic idea of colour indexing in frame extraction is to compare similarity of two frames that is mainly RGB image to HSV image change. The change will be in terms of their shade and brightness. Therefore, frame extraction is a process which underlines two di cult task that is deciding what is relevant and extracting it.
Image pre-processing lters:
The primary goal of the pre-processing stage is to ensure the uniform input to classi cation network. Image pre-processing has been used for detection of hand which is applied in di erent elds. The rst step involves the reading of gesture colour image and conversion of input RGB image to HSV colour space. This step is done because HSV model is more sensitive to changes in lighting condition. The resulting image is then ltered, smoothed and nally we obtain a gray scale image.
Colour Thresholding:
Colour thresholding is the simplest method of image segmentation. From a gray scale image, thresholding can be used to create binary images. The purpose of thresholding is to extract those pixels from some images which represent an object. The key value of this method is a threshold value. Using this threshold value gray scale image is converted into a binary image. The important step of this method is the selection of the threshold value. Several methods are used, including Otsu's
method (maximum variance), k-means clustering and maximum entropy method.
Template Generation:
After performing the above steps of image processing a template of image is been generated. Then this template will be further used for template matching and ac-cordingly the required action will be taken for hand gesture recognition.
Template Matching:
Template Matching is a technique in image processing for nding small parts of an image which match a template image. It can be used as a way to detect edges in images. At region boundaries there is a sharp adjustment in intensity because of this edges and region boundaries are closely related. Edge detection techniques are used as the base of another segmentation technique. The edges that are identi ed by the edge detection method are often disconnected. Closed region boundaries are needed to segment an object from an image. Edges that are obtained from edge detector to which segmentation method can also be applied. After application of the edge lter to the image, pixels are then classi ed as edge or non-edge.
Action Mapping:
After pre-processing, a code will be generated which will be further sent to the IC(max 232 convertor). The input code to IC will be converted to binary code. IC would be connect to CPU through serial port having 9600bps. The binary code will be input to the micro-controller. The binary code will be compared and the corresponding action will be executed.
Motion Recognition:
After image pre-processing the image is then given to the distance transform method which detect the motion of the hand. Therefore the pixel that is far from every boundary is chosen as centroid. Using this centroid active ngers are counted and if there is motion of hand, this is detected by motion of centroid from original position from a set of continuously captured images and the appliance is controlled that is either the device is on or o after recognizing gestures.
Haar Cascade Detection: We use the Viola-Jones face and upper body detector to z the face and upper body positions of the subject. The detector is based on haar cascade classi ers. Each classi ers uses rectangular haar features to classify the region of the image as a positive or negative match.
The Stick Skeleton Model: We represent the human skeleton model by 7 body parts involving 8 points we use anthropometric data from the NASA Anthro-pometric Data Source Book to estimate the size of the body parts. We x the head and neck points as the centroid and midpoint of the base line of the face detection rectangle.
Skin Segmentation: We do skin color segmentation on the foreground seg-mented RGB image obtained from the Kinect to aid our arm tting process. For skin segmentation, we project the input RGB image into HSV color space and the pixels values between two thresholds hsvmax and hsvmin are set to 255 while the rest are set to 0.
Arm Fitting: In order to t the arms we make use of the Extended Distance Transform and the skin segmentation mask computed in the previous two steps. We initiate an angular search around the pivot point (shoulder point for elbow point estimation and elbow point for wrist point estimation) at a xed sampling frequency and compute the summed EDT values along those lines.