01-10-2012, 03:47 PM
HISTOGRAM OF CONFIDENCES FOR PERSON DETECTION
HISTOGRAM.pdf (Size: 931.38 KB / Downloads: 26)
ABSTRACT
This paper focuses on the problem of person detection in
harsh industrial environments. Different image regions often
have different requirements for the person to be detected. Additionally,
as the environment can change on a frame to frame
basis even previously detected people can fail to be found. In
our work we adapt a previously trained classifier to improve
its performance in the industrial environment. The classifier
output is initially used an image descriptor. Structure
from the descriptor history is learned using semi-supervised
learning to boost overall performance. In comparison with
two state of the art person detectors we see gains of 10%.
Our approach is generally applicable to pretrained classifiers
which can then be specialised for a specific scene.
INTRODUCTION
Detection of people in images has a long history [1, 2] but despite
this has yet to yield good results particularly in cluttered
or complex environments. Such environments are prevalent
in industry and serve as the motivation for this work. Typical
industrial environments are harsh for image processing. They
suffer from rapid lighting changes (machinery in operation),
occlusion (obscured by equipment), and camera shake (transport
of heavy machinery). The environments are also lit to
enable the employees to perform their tasks rather than capture
them. In our work we have recordings from within such
an environment and are examining the problem of person detection.
Our resulting method needs to be robust and able to
be adaptive. As a starting point for our analysis we examined
the most popular person detectors [3, 4].
APPROACH
We will start with outlining how a typical person detector
works. For a given image I, which is a single frame in sequence
of images with infinite past and future we can apply a
person detector, H. The output, P, from the person detector
is a number of bounding boxes with associated confidences.
These can be ordered by the region and the scale they correspond
to. P has 59000 results for [3] and 47000 for [4].
Typically, a global threshold is applied to P to find the best
candidates for people, Pt. Rather than using Pt we propose to
use the original results from the person detector P. The motivation
here is to exploit relationships from within the data
and history to improve the performance.
EXPERIMENTS
This section presents preliminary results from the system applied
to several different data sets. The data sets we chose
were two separate collections (6 months apart) from a large
manufacturing environment and the i-Lids data set. We chose
one subset of each for experimentation taking approximately
1400 frames for each set. A window size (Nw) for the timeseries
histograms of 200 frames was chosen and a further 200
frames were used to initialise the Gaussian models. Figure 5
shows an example image with the various important steps in
our system. Notice that our adaptive technique reduces the
search space for people dramatically. Specifically, results are
concentrated around the region where historically a person
has been. The false positives in the image are mostly removed
by the application of morphology. Some more examples of
the result on the data sets examined are shown in figure 6.
The three examples show the systems ability to find people in
occluded environments. Furthermore, in the case of i-Lids it
finds two people at very different scales.
CONCLUSION
In this paper we presented a novel approach for improving
person detection based on using the output of an existing classifier
as an image descriptor. Salient features from the history
of this descriptor are learned via semi-supervised learning to
improve the classification task. We presented several main
contributions. Firstly, we reinterpreted the bounding box data
as a confidence map which can be examined on a per pixel
fashion. Secondly, we proposed that useful information could
be learned from the history of these confidences. Finally, we
presented a mixture method to model this descriptor and learn
in an on-line fashion a correction which gives a better classification
of people. Currently, the work is ongoing but is showing
promising results. We are currently looking at the efficacy
of the descriptor. Additionally we would like to apply
our technique to correct other pretrained classifiers. Lastly,
we plan to speed the performance of the approach by looking
at simpler methods to model the descriptor behaviour.