27-11-2012, 11:53 AM
Virtual Mouse----Inputting Device by Hand Gesture Tracking and Recognition
1Virtual Mouse.pdf (Size: 62.77 KB / Downloads: 70)
Abstract.
In this paper, we develop a system to track and recognize hand
motion in nearly real time. An important application of this system is to
simulate mouse as a visual inputting device. Tracking approach is based on
Condensation algorithm, and active shape model. Our contribution is
combining multi-modal templates to increase the tracking performance.
Weighting value is given to the sampling ratio of Condensation by applying the
prior property of the templates. The recognition approach is based on HMM.
Experiments show our system is very promising to work as an auxiliary
inputting device.
Introduction
Hand gesture based human-computer interfaces have been proposed in many
virtual reality systems and in computer aided design tools. In these systems, however,
the user must wear special physical devices such as gloves and magnetic device.
Comparatively, vision based system is a naturer way. It is very attractive to utilize
hand gesture as a kind of “mouse” using only visual information. But it is in fact an
inherently difficult task although it is very easy for human being. One obvious
difficulty is that hand is complex and highly flexible structure. Tracking and recognizing hand motion is the basic techniques needed for this task. Several
attempts to recognize hand gesture can be referred in [1,2,3,4].
Generally, gesture researches divide the recognition process into two stages. First,
some low-dimensional feature vector is extracted from an image sequence. Most
classical method is to segment the object out from image. And the information is
obtained to describe the object. Second, recognition is preformed directly or
indirectly on these observation data. However it is highly desirable to develop
systems where recognition feeds back into the motion feature extraction because the
motion style is tightly related to the activity. A great potential advantage of the multimodel
approach is that recognition and feature extraction are preformed jointly, and
so the form of the expected gesture can be used to guide feature search, potentially
making system more efficient and robust.
Related Works
Active Shape Models (ASM) proposed by Cootes[5] is a successful method to
track deformable objects. It can get a high accuracy and can cope with clutter. But its tracking performance greatly depends on a good starting approximation, so the object
movement must be not too large, that limits its application. Random sampling filters
[6,7] were introduced to address the need to represent multiple hypotheses when
tracking. The Condensation algorithm [7] based on factored sampling has been
applied to the problem of visual tracking in clutter. It has the striking property: despite
its use of random sampling which is often thought to be computationally inefficient,
the Condensation algorithm runs in near real time. This is because tracking over time
maintains relatively tight distributions for shape at successive time-steps, and
particularly so given the availability of accurate, learned models of shape and motion.
The Condensation algorithm has a natural mechanism to trade off speed and
robustness. Increasing the sample set size N can lower the tracking speed, but obtain a
higher accuracy.
Our Approaches
Multi hand templates and PCA representation
Assuming one hand model is described by a vector xe , a training set of these
vectors is assembled for a particular model class, in our case the hand in its various
different poses. The training set is aligned (using translation, rotation and scaling) and
the mean shape calculated by finding the average vector. To represent the deviation
within the shape of the training set, Principle Component Analysis (PCA) is
preformed on the deviation of the example vectors from the mean.
Experiments
The system run on a PII450 PC machine, about 10 frames can be tracked using
our tracker and after about 2 frame delay the mouse’s action is determined. The
function of system is like this: the application shows a window, displaying the video
images being seen by a camera. When a hand moving on the desk in the view of the
camera, the hand was located and tracked until it moved out of view. Another window
shows an mouse icon moving according the position; the color and pattern of the icon
are changing with the action of the mouse.
Conclusion
A system for hand tracking and gesture recognition has been constructed. A new
method which extends the Condensation algorithm by introducing Active Shape
Models and multi-modal templates, is used to fulfill this task. The system works in
near real time, and virtual mouse position and action are interestingly controlled by
hand.