04-12-2012, 06:39 PM
Hand Gesture Recognition Using Computer Vision
Hand Gesture Recognition.pdf (Size: 335.4 KB / Downloads: 79)
Abstract
The proposed algorithm describes a vision-based approach to hand gesture recognition, which deals with
real-time tracking of the hand with simultaneous recognition of the gesture being made. Accurate tracking
of the hand is done using Haar-like features, while gesture recognition is done by computing orientation
histograms of video frames. The system is trained using a well-dened vocabulary of hand gestures, and the
algorithm generates the current position of the hand and the gesture being made. The AdaBoost algorithm
is used to boost the process of Haar-training. The use of orientation histograms provides translational invari-
ance and also makes it adaptable to variations in illumination. The algorithm has been demonstrated on a
real-time simplied drawing application that uses hand gestures as a means of human-computer interaction
(HCI). The algorithm and its application have been implemented using an ARM-based processor, in order
to demonstrate its eectiveness on embedded devices.
Introduction
In this paper, we describe a method of implementing, on computer, one of the most sought
after means of communication: Gestures. HumanComputer interaction, over recent times,
has been given an extremely diversied approach, with Computer Vision being one of the
forerunners in the eld. In everyday life, human activities deal with the interaction with
machines. With gesture recognition, these complicated interfaces can be simplied to create a
more user-friendly environment. Integrating Computers with Hand Gestures makes it easier
to control commands. Devices such as controllers for games, security systems and television
remotes require the user to either push a button or touch a screen for their functioning. Over
the centuries, these devices have always been improved and enhanced by advancements in
technology, making them easier to operate from the users point of view. In the past decade,
a lot of devices that were previously button-operated have been replaced successfully with
touch-technology, in which the user generates a command by simply touching a screen.
Today, we recognize a denite possibility of reducing these human-machine interactions to
a simple hand gesture. With a little imagination, hand gesture recognition can have sundry
applications in todays world.
The Proposed Methodology
This paper proposes an algorithm for the real- time dynamic recognition of hand gestures
for HCI. This real- time algorithm is based on two phases of implementation- the detection
phase and the recognition phase. In the detection phase, Haar- like features are used to detect
certain limited postures of the human hand. Once the hand is detected, the algorithm enters
the recognition phase, in which the instantaneous gesture of the hand is calculated using
normalized orientation histograms. There is a rigorous training phase before real- time
operations can start, which includes generating a multi- staged cascade of Haar features.
This involves the creation of an extensive database of positive images as well as negative
images. Since Haar-like feature is a weak classier, it is well integrated with an iterative
learning algorithm called Adaboost. This constructs a strong classier using only a training
set and a weak learning algorithm, resulting in the generation of a cascade of Haar classiers.
This cascade is the entity that denes what object is to be recognized, which in our case is
the human hand. During this training, care should be taken to include as many dierent
postures of the human hand as possible, in order for the detection to not be too stringent.
Also, in real time implementation, detection of hand gestures relies on various factors such
as complex backgrounds, intensity variations and dierent zooming conditions. All these
dierent possible variations, along with noise, must be accounted for while the database is
being created. Haar- training is essential for the detection phase of the algorithm. Apart
from this, a separately trained hand gesture set must also be incorporated for instantaneous
recognition of the hand gesture made, after it has been detected. This set includes a limited
number of hand gestures, well- distinguished from one another, which are required to be
recognized in the real- time operation of the system.
Haar- Like Features:
Haar- like features are digital image features used in object detection algorithms. They owe
their name to their intuitive similarity with Haar wavelets. The main purpose of Haar-
like features is to meet the real-time requirements, while providing speed, accuracy and
robustness against dierent backgrounds and lighting conditions. The idea behind Haar-like
features is to recognize objects in an image based on the value of simple features, instead of
using raw pixel values directly. There are two motivations for the employment of the Haar-
like features rather than raw pixel values. The rst reason is that compared with raw pixels,
the Haar-like features can eciently reduce/increase the in-class/out-ofclass variability, thus
making classication easier.
Adaboost
Although, it is not dicult to nd a Haar-like feature that has slightly better accuracy
than random guessing, a single Haar-like feature is certainly not enough to identify the
object with a high accuracy. For this reason, the AdaBoost learning algorithm is used to
boost the classication performance of a simple learning algorithm. Boosting is a supervised
machine learning algorithm to improve the overall accuracy stage by stage based on a series
of weak classiers. The AdaBoost algorithm rst chooses the feature that classies more data
correctly. In the next step, the data is re-weighted to increase the importance of misclassied
samples. This process continues and at each stage the weight of each weak learner among
the other learners is determined. The AdaBoost learning algorithm starts with uniform
distribution of weights over each trained hand gesture sample. A weak classier is obtained
from the weak learning algorithm. The weights are increased for those hand gesture sample
images that were misclassied previously. This process continues by the addition of new
weak classiers to the linear combination till the accuracy standards are met. Finally a
strong classier is obtained as a linear combination of the selected weak classiers.
Orientation Histogram
For the purpose of static gesture recognition, we compute orientation histograms for each
frame that is being processed. From William T Freeman [1], in an orientation histogram,
each bin is classied according to the intensity gradients of adjacent pixels in a digital image.
To eradicate the eects of small or redundant gradients, we set a threshold level, below which
the contrast between those particular pixels is not good enough for processing. Firstly, there
is a training sequence in which a preferred set of gestures is inculcated into the system. The
orientation histograms of each of these trained gesticulations are computed. In order to nd
the local gradients, the intensity values of each pair of adjacent pixels are subtracted in the
horizontal as well as vertical directions, and then the inverse tangent of the ratio of the
latter to the former is found.