24-09-2012, 01:42 PM
Background and Foreground Modeling Using Nonparametric Kernel Density Estimation for Visual Surveillance
Background and Foreground.pdf (Size: 397 KB / Downloads: 14)
ABSTRACT
Automatic understanding of events happening at a site is the
ultimate goal for many visual surveillance systems. Higher level
understanding of events requires that certain lower level computer
vision tasks be performed. These may include detection of unusual
motion, tracking targets, labeling body parts, and understanding
the interactions between people. To achieve many of these tasks,
it is necessary to build representations of the appearance of
objects in the scene. This paper focuses on two issues related to
this problem. First, we construct a statistical representation of
the scene background that supports sensitive detection of moving
objects in the scene, but is robust to clutter arising out of natural
scene variations. Second, we build statistical representations of
the foreground regions (moving objects) that support their tracking
and support occlusion reasoning. The probability density functions
(pdfs) associated with the background and foreground are likely
to vary from image to image and will not in general have a known
parametric form. We accordingly utilize general nonparametric
kernel density estimation techniques for building these statistical
representations of the background and the foreground. These
techniques estimate the pdf directly from the data without any
assumptions about the underlying distributions. Example results
from applications are presented.
INTRODUCTION
In automated surveillance systems, cameras and other sensors
are typically used to monitor activities at a site with the
goal of automatically understanding events happening at the
site. Automatic event understanding would enable functionalities
such as detection of suspicious activities and site security.
Current systems archive huge volumes of video for
eventual off-line human inspection. The automatic detection
of events in videos would facilitate efficient archiving and
automatic annotation. It could be used to direct the attention
of human operators to potential problems. The automatic detection
of events would also dramatically reduce the bandwidth
required for video transmission and storage as only interesting
pieces would need to be transmitted or stored.
MODELING THE BACKGROUND
Background Subtraction: A Review
1) The Concept: In video surveillance systems, stationary
cameras are typically used to monitor activities at
outdoor or indoor sites. Since the cameras are stationary, the
detection of moving objects can be achieved by comparing
each new frame with a representation of the scene background.
This process is called background subtraction and
the scene representation is called the background model.
Typically, background subtraction forms the first stage
in an automated visual surveillance system. Results from
background subtraction are used for further processing, such
as tracking targets and understanding events.
A central issue in building a representation for the scene
background is what features to use for this representation
or, in other words, what to model in the background. In
the literature, a variety of features have been used for
background modeling, including pixel-based features (pixel
intensity, edges, disparity) and region-based features (e.g.,
block correlation). The choice of the features affects how
the background model tolerates changes in the scene and the
granularity of the detected foreground objects.
MODELING THE FOREGROUND
Modeling Color Blobs
Modeling the color distribution of a homogeneous region
has a variety of applications for object tracking and recognition.
The color distribution of an object represents a feature
that is robust to partial occlusion, scaling, and object deformation.
It is also relatively stable under rotation in depth in
certain applications. Therefore, color distributions have been
used successfully to track nonrigid bodies [5], [26]–[28],
e.g., for tracking heads [29], [28], [30], [27], hands [31],
and other body parts against cluttered backgrounds from stationary
or moving platforms. Color distributions have also
been used for object recognition.
Segmentation of Multiple People
Visual surveillance systems are required to keep track of
targets as they move through the scene even when they are
occluded by or interacting with other people in the scene. It
is highly undesirable to lose track of the targets when they
are in a group. It is even more important to track the targets
when they are interacting than when they are isolated. This
problem is important not only for visual surveillance but also
for other video analysis applications such as video indexing
and video archival and retrieval.
In this section, we show how to segment foreground regions
corresponding to a group of people into individuals
given the representation for isolated people presented in Section
IV-B. One drawback of this representation is its inability
to model highly articulated parts such as hands. However,
since our main objective is to segment people under occlusion,
we are principally concerned with the mass of the body.
Correctly locating the major blobs of the body will provide
constraints on the location of the hands which could then be
used to locate and segment them. The assumption we make
about the scenario is that the targets are visually isolated before
occlusion so that we can initialize their models.