25-08-2017, 09:32 PM
Abstract
Significant research has been devoted to detecting people
in images and videos. In this paper we describe a human detection
method that augments widely used edge-based features
with texture and color information, providing us with
a much richer descriptor set. This augmentation results
in an extremely high-dimensional feature space (more than
170,000 dimensions). In such high-dimensional spaces,
classical machine learning algorithms such as SVMs are
nearly intractable with respect to training. Furthermore,
the number of training samples is much smaller than the
dimensionality of the feature space, by at least an order
of magnitude. Finally, the extraction of features from a
densely sampled grid structure leads to a high degree of
multicollinearity. To circumvent these data characteristics,
we employ Partial Least Squares (PLS) analysis, an efficient
dimensionality reduction technique, one which preserves
significant discriminative information, to project the
data onto a much lower dimensional subspace (20 dimensions,
reduced from the original 170,000). Our human detection
system, employing PLS analysis over the enriched
descriptor set, is shown to outperform state-of-the-art techniques
on three varied datasets including the popular INRIA
pedestrian dataset, the low-resolution gray-scale DaimlerChrysler
pedestrian dataset, and the ETHZ pedestrian
dataset consisting of full-length videos of crowded scenes.
1. Introduction
Effective techniques for human detection are of special
interest in computer vision since many applications involve
people’s locations and movements. Thus, significant research
has been devoted to detecting, locating and tracking
people in images and videos. Over the last few years the
problem of detecting humans in single images has received
considerable interest. Variations in illumination, shadows,
and pose, as well as frequent inter- and intra-person occlusion
render this a challenging task. Figure 1 shows an image
of a particularly challenging scene with a large number of
persons, overlaid with the results of our system.
Two main approaches to human detection have been explored
over the last few years. The first class of meth-
Figure 1. Image demonstrating the performance of our system in
a complex scene. The image (689 £ 480 pixels) is scanned at 10
scales to search for humans of multiple sizes. We achieve minimal
false alarms even though the number of detection windows is
44; 996 (best visualized in color).
ods consists of a generative process where detected parts
of the human body are combined according to a prior human
model. The second class of methods considers purely
statistical analysis that combine a set of low-level features
within a detection window to classify the window as containing
a human or not. The method presented in this paper
belongs to the latter category.
Dalal and Triggs [5] proposed using grids of Histograms
of Oriented Gradient (HOG) descriptors for human detection,
and obtained good results on multiple datasets. The
HOG feature looks at the spatial distribution of edge orientations.
However, this may ignore some other useful sources
of information, thus leading to a number of false positive
detections such as the ones shown in Figure 2. Our analysis
shows that information such as the homogeneity of human
clothing, color, particularly skin color, typical textures of
human clothing, and background textures complement the
HOG features very well. When combined, this richer set of
descriptors helps improve the detection results significantly
Download full report
http://www.googleurl?sa=t&source=web&cd=...ICCV09.pdf&ei=wHoyTrG7IYXzrQeVmuTTAQ&usg=AFQjCNGc_AkFQDp4SzVL7w9leKENlt0OUQ&sig2=wp_3-GmpVgjOsAbXObpdWw