15-06-2012, 04:11 PM
Human Motion Tracking by Temporal-Spatial Local Gaussian Process Experts
human motion tracking by temporal-spatial local gaussian process experts.pdf (Size: 1.26 MB / Downloads: 43)
INTRODUCTION
VISION BASED human motion tracking has been a fundamental
open problem, with pervasive real-world applications
[1], such as surveillance, rehabilitation, diagnostics, and
human computer interaction. Among the large amount of studies
in this field, the discriminative approach [2] has been prevalent
due to its feasibility of fast inference in real-world scenarios and
flexibility of adapting to different learning methods. The typical
Manuscript received November 12, 2009; revised April 07, 2010; accepted
August 19, 2010. Date of publication September 16, 2010; date of current version
March 18, 2011. This work was supported in part by the National Basic
Research Program (973 Program) of China (No. 2011CB302203), the Key Program
of National Natural Science Foundation of China (No. 60833009) and the
SUNY Buffalo Faculty Startup Funding. The associate editor coordinating the
review of this manuscript and approving it for publication was Dr. Patrick J.
Flynn.
LOCAL GAUSSIAN PROCESS EXPERTS MODEL
In this section, we present the sparse strategy of GP regression
in the unified input-output space, which leads to our proposed
local GP experts model. We review the GP regression
in Section II-A and then present the detailed algorithm for our
model in Section II-B.
TEMPORAL-SPATIAL LOCAL GP EXPERTS
Based on the spatial experts, we introduce the temporal experts
as an extension to handle multimodality more effectively.
In the temporal-spatial combined GP experts model, the spatial
local experts learn the relationship between the input space and
output space, while the temporal local experts explore the underlying
context of the output space.
On the HumanEva Dataset
We evaluate our models on the HumanEva dataset [25].
The database provides synchronized video and motion capture
streams. The frame rate of the video stream is 60 Hz. It contains
multiple subjects performing a set of predefined actions
with repetitions. The database was originally partitioned into
training, validation, and testing subsets. We use sequences in
the original training subset for training and original validation
subset for testing. Table II shows the description of the HumanEva
dataset we used in the experiments specified by frame
CONCLUSION
We have presented a novel temporal-spatial combined local
GP experts model for efficient estimation of 3-D human pose
from monocular images. Our model is essentially a type of mixture
of GP experts in which we incorporate both spatial and
temporal information into a seamless system to handle multimodality.
The local experts are trained in the local neighborhood.
Different from previous work, the neighborhood relationship
is defined in the unified input-output space. Therefore, we
can flexibly handle two-way multimodality. Learning and inference
of this model are extremely efficient because both spatial
and temporal local experts are defined online within very
small neighborhoods.
numbers.