31-08-2016, 12:26 PM
A study of Gesture Recognition by kinect Skeleton tracking based on Hidden
Markov Model , Continuous Hidden Markov Models and ANN
1451725200-iciece2014submission15.pdf (Size: 165.57 KB / Downloads: 18)
Abstract— This paper addresses the performance of the
recognition of hand gestures. The goal is to provide a comparison
of three state-of-the-art techniques for gesture recognition. The
models proposed are the Hidden Markov Model (HMM) and
Continuous Hidden Markov Model (CHMM) and Artificial
Neural Networks(ANN). The algorithms and features proposed
for hand gesture recognition are not evaluated on common data.
We thus propose to use publicly available databases for our
comparison of gesture recognition techniques.
Index Terms— Gesture Recognition, Kinect, Skeleton
Tracking, Hidden Markov Model, Artificial Neural Network
I. INTRODUCTION
Gesture recognition based on vision is one of the most
powerful techniques available for human-computer
interaction. The goal of Human Computer Interaction HCI is to
bring the performance of human machine interaction similar to
human-human interaction. Gestures play an important role in
our daily life, and they help people to convey information and
express their feelings. The hand is the most effective,
general-purpose interaction tool compared to the other parts
of the human body. Therefore, gesture tracking and
recognition becomes an active area of research. The types of
gesture recognitions are static and hand gesture recognition.
Tracking frameworks have been used to handle dynamic
gesture. Several methods on hand gesture recognition have
been proposed, which differ from one another such as Neural
Network, Fuzzy Systems and Hidden Markov Models
HMMs. The same gesture can differ in velocity, shape,
duration, and integrality which make it more difficult to
recognize dynamic hand gestures than to recognize static
ones. In this paper a comparative study is included and review
of recent hand gesture recognition systems is presented with
description of hand gestures modeling, analysis and
recognition.
II.THE KINECT SENSOR
The kinect sensor was introduced in November 2010 as a
new input method to the Xbox 360. it was introduced as a
competitor to the Nintendo and later the play station move
from Sony. Apart from being a gaming console peripheral the
kinect has also served as a depth sensing device for
developers. Previous to the kinect, depth sensors were more
popular and therefore were a bigger investment.
The Microsoft Kinect is a depth sensor, with a microphone
array for voice recognition and a standard RGB camera. The
depth is sensed by projecting an infrared grid of dots called
structured light. Then a infrared sensor measures the shift
between the dots and computes the depth. When using
multiple Kinect at the same time, the use of infrared light
causes problems.
Interference caused by the multiple projected infrared
grid will cause noise in the computed depth. The
interference can be avoided by placing the Kinects in the
right way. The interference can be reduced by placing the Kinects with a perpendicular viewing angle. One really
interesting feature on the Kinect platform is the built-in
skeleton tracker.
III. GESTURE RECOGNITION
Gesture is the standard non-verbal communication form
for human beings and has existed from the modern humans
generation. Gestures are used all the time while
communicating with others, in form of either facial
expressions or movement with hands. The sign language is
a complete language with grammar and vocabulary, and is
totally based on gestures. It has also been proven, that
gesture is one of the first steps for learning to speak. The
skeleton tracked by the Kinect gives a good model for arms
movement and is a great foundation for gesture-based
communication for the system. Gesture recognition helps computers to understand
human body language. This helps to build a better link
between humans and machines, rather than just the text user
interfaces or graphical user interfaces. These old-fashioned
input methods still limit all inputs to mouse and keyboard.
Using gesture recognition, human-machine interactions
(HMIs) can be interpreted without using any mechanical
device. For eg. the concept of gesture recognition may be
used to move a cursor just by pointing a finger at the
computer monitor. The benefits offered by gesture
recognition technology may make standard input devices
like the keyboard, mouse and even touch-screen obsolete.
Recognizing gestures as input can be very helpful for
physically-impaired persons. Also, gesture recognition
enables natural interaction for a 3-D virtual world or a
gaming environment.
By using a controller with gyroscopes and
accelerometers, the gestures of the body and hands can be
increased to sense rotation and tilting, as well as movement
acceleration. Unlike haptic interfaces, gesture recognition
technology does not demand the user to sport any specific
gear or device. The body gestures are imaged by a camera
rather than sensors installed on a device.
IV. SKELETAL TRACKING USING KINECT
Skeletal Tracking allows Kinect to follow the actions of the
people. Using the infrared (IR) camera, Kinect can recognize
up to six users in the field of view of the sensor and track two
from these in detail. The joints of the tracked users in space
can be located and their movements can be followed. The
skeletal tracking system recognizes users in standing or sitting
position, and facing the camera sensor. In the default mode,
the Kinect can detect people standing between 0.8 mt and 4.0
mt. The practical range is 1.2 mt to 3.5 mt, which allows for
the hand movement of the user. The field of view for the
sensor can be changed with Depth Range Enumeration. To
track users, first enable the skeletal tracking in Kinect. The
information about the tracked users is provided in the form of
an array of Skeleton objects present in the frame. The
skeletons in a frame can be tracked or positioned which gives the tracking state. A skeleton with tracking state of position
only gives information about the position of the user, without
the details of the joints. A tracked skeleton provides detailed
information about the position of twenty joints in the user's
body