Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Vision-based Hand Gesture Recognition for Human-Computer Interaction pdf
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Vision-based Hand Gesture Recognition for Human-Computer Interaction

[attachment=62667]

Introduction

In recent years, research efforts seeking to provide more natural, human-centered
means of interacting with computers have gained growing interest. A particu-
larly important direction is that of perceptive user interfaces, where the com-
puter is endowed with perceptive capabilities that allow it to acquire both im-
plicit and explicit information about the user and the environment. Vision has
the potential of carrying a wealth of information in a non-intrusive manner and
at a low cost, therefore it constitutes a very attractive sensing modality for
developing perceptive user interfaces. Proposed approaches for vision-driven
interactive user interfaces resort to technologies such as head tracking, face and
facial expression recognition, eye tracking and gesture recognition.
In this paper, we focus our attention to vision-based recognition of hand
gestures. The first part of the paper provides an overview of the current state
of the art regarding the recognition of hand gestures as these are observed and
recorded by typical video cameras.

Computer Vision Techniques for Hand Gesture Recognition

Most of the complete hand interactive systems can be considered to be comprised
of three layers: detection, tracking and recognition. The detection layer is
responsible for defining and extracting visual features that can be attributed to
the presence of hands in the field of view of the camera(s). The tracking layer is
responsible for performing temporal data association between successive image
frames, so that, at each moment in time, the system may be aware of “what
is where”. Moreover, in model-based methods, tracking also provides a way
to maintain estimates of model parameters, variables and features that are not
directly observable at a certain moment in time. Last, the recognition layer is
responsible for grouping the spatiotemporal data extracted in the previous layers
and assigning the resulting groups with labels associated to particular classes
of gestures. In this section, research on these three identified subproblems of
vision-based gesture recognition is reviewed.

Color

Skin color segmentation has been utilized by several approaches for hand detec-
tion. A major decision towards providing a model of skin color is the selection
of the color space to be employed. Several color spaces have been proposed
including RGB, normalized RGB, HSV, YCrCb, YUV, etc. Color spaces ef-
ficiently separating the chromaticity from the luminance components of color
are typically considered preferable. This is due to the fact that by employing
chromaticity-dependent components of color only, some degree of robustness to
illumination changes can be achieved. Terrillon et al [TSFA00] review different
skin chromaticity models and evaluate their performance.

Shape

The characteristic shape of hands has been utilized to detect them in images
in multiple ways. Much information can be obtained by just extracting the
contours of objects in the image. If correctly detected, the contour represents
the shape of the hand and is therefore not directly dependent on viewpoint, skin
color and illumination. On the other hand, the expressive power of 2D shape can
be hindered by occlusions or degenerate viewpoints. In the general case, contour
extraction that is based on edge detection results in a large number of edges
that belong to the hands but also to irrelevant background objects. Therefore,
sophisticated post-processing approaches are required to increase the reliability
of such an approach. In this spirit, edges are often combined with (skin-)color
and background subtraction/motion cues.

Learning detectors from pixel values

Significant work has been carried out on finding hands in grey level images based
on their appearance and texture. In [WH00], the suitability of a number of clas-
sification methods for the purpose of view-independent hand posture recognition
was investigated. Several methods [CSW95, CW96b, QZ96, TM96, TVdM98]
attempt to detect hands based on hand appearances, by training classifiers on
a set of image samples. The basic assumption is that hand appearance differs
more among hand gestures than it differs among different people performing the
same gesture. Still, automatic feature selection constitutes a major difficulty.
Several papers consider the problem of feature extraction [TM96, QZ96, NR98,
TVdM98] and selection [CSW95, CW96b], with limited results regarding hand
detection. The work in [CW96b], investigates the difference between the most
discriminating features (MDFs) and the most expressive features (MEFs) in
the classification of motion clips that contain gestures. It is argued that MEFs
may not be the best for classification, because the features that describe some
major variations in the class are, typically, irrelevant to how the sub-classes are
divided. MDFs are selected by multi-class, multivariate discriminate analysis
and have a significantly higher capability to catch major differences between
classes. Their experiments also showed that MDFs are superior to the MEFs in
automatic feature selection for classification.

3D model-based detection

A category of approaches utilize 3D hand models for the detection of hands in
images. One of the advantages of these methods is that they can achieve view-
independent detection. The employed 3D models should have enough degrees
of freedom to adapt to the dimensions of the hand(s) present in an image.
Different models require different image features to construct feature-model
correspondences. Point and line features are employed in kinematic hand mod-
els to recover angles formed at the joints of the hand [RK95, SSKM98, WTH99,
WLH01]. Hand postures are then estimated provided that the correspondences
between the 3D model and the observed image features are well established. Var-
ious 3D hand models have been proposed in the literature. In [RK94, SMC02],
a full hand model is proposed which has 27 degrees of freedom (DOF) (6 DOF
for 3D location/orientation and 21 DOF for articulation). In [LWH02], a “card-
board model” is utilized, where each finger is represented by a set of three
connected planar patches. In [GdBUP95], a 3D model of the arm with 7 pa-
rameters is utilized. In [GD96], a 3D model with 22 degrees of freedom for the
whole body with 4 degrees of freedom for each arm is proposed. In [MI00], the
user’s hand is modeled much more simply, as an articulated rigid object with
three joints comprised by the first index finger and thumb.

Tracking based on the Mean Shift algorithm

The Mean Shift algorithm [Che95] is an iterative procedure that detects local
maxima of a density function by shifting a kernel towards the average of data
points in its neighborhood. The algorithm is significantly faster than exhaustive
search, but requires appropriate initialization.
The Mean Shift algorithm has been utilized in the tracking of moving ob-
jects in image sequences. The work in [CRM00, CRM03] is not restricted to
hand tracking, but can be used to track any moving object. It characterizes the
object of interest through its color distribution as this appears in the acquired
image sequence and utilizes the spatial gradient of the statistical measurement
towards the most similar (in terms of color distribution similarity) image region.
An improvement of the above approach is described in [CL01], where the mean
shift kernel is generalized with the notion of the “trust region”. Contrary to
mean shift which directly adopts the direction towards the mean, trust regions
attempt to approximate the objective function and, thus, exhibit increased ro-
bustness towards being trapped in spurious local optima. In [Bra98], a version
of the Mean Shift algorithm is utilized to track the skin-colored blob of a human
hand. For increased robustness, the method tracks the centroid of the blob and
also continuously adapts the representation of the tracked color distribution.
Similar is also the method proposed in [KOKS01], except the fact that it uti-
lizes a Gaussian mixture model to approximate the color histogram and the EM
algorithm to classify skin pixels based on the Bayesian decision theory.