22-11-2012, 04:08 PM
Active Recognition through Next View Planning: A Survey
Active Recognition through Next.pdf (Size: 564.92 KB / Downloads: 27)
Abstract
3-D object recognition involves using image-computable features to identify 3-D object. A
single view of a 3-D object may not contain sufficient features to recognize it unambiguously.
One needs to plan different views around the given object in order to recognize it.
Such a task involves an active sensor – one whose parameters (external and/or internal)
can be changed in a purposive manner. In this paper, we review two important applications
of an active sensor. We first survey important approaches to active 3-D object recognition.
Next, we review existing approaches towards another important application of an active
sensor namely, that of scene analysis and interpretation.
Introduction
3-D object recognition is the process of identifying 3-D objects from their images
by comparing image-based features, or image-computable representations with a
stored representation of the object. (For detailed surveys of 3-D object recognition
and related issues, see [1], [2]) Various factors affect the strategy used for recognition,
such as the type of the sensor, the viewing transformations, the type of object,
and the object representation scheme. Sensor output could be 3-D range images, or 2-D intensity images. 3-D range images can be obtained from the output of a
light stripe range finder, for example. 2-D images may be obtained from various
means such as CCD cameras, infra-red devices, X-ray images, or from other devices
operating on different ranges of the electromagnetic spectrum. 3-D objects
may be classified as rigid, articulated, or deformable. In this survey, we primarily
concentrate on 2-D intensity images taken with cameras. This paper is restricted to
the recognition of rigid 3-D objects and analysis of 3-D scenes.
The Need for Multiple Views
Most model-based 3-D object recognition systems consider the problem of recognizing
objects from the image of a single view of an object ([1], [2], [3], [4]). Due
to the inherent loss of information in the 3-D to 2-D imaging process, one needs an
effective representation of properties (geometric, photometric, etc.) of objects from
images which are invariant to the view point, and should be computable from image
information. Invariants may be colour-based (e.g., [5]), photometric (e.g., [6])
or geometric (e.g., [3]).
Burns, Weiss and Riseman prove a theorem in [7] that geometric invariants cannot
be computed for a set of 3-D points in general position, from a single image. Invariants
can only be computed for a constrained set of 3-D points. One can impose
constraints on the nature of objects to compute invariants for recognition [8] – this
severely restricts the applicability of the recognition system to only specific classes
of objects e.g., canal surfaces [9], [10], rotational symmetry [8], [11], [3], repeated
structures (bilateral symmetry, translational repetition) [3], [12], [13]. While invariants
may be important for recognizing some views of an object, they cannot
characterize all its views – except in a few specific cases, as mentioned above. We
often need to recognize 3-D objects which because of their inherent asymmetry,
cannot be completely characterized by an invariant computed from a single view.
For example, certain self-occluded features of an object can become visible if we
change the viewpoint. In order to use multiple views for an object recognition task,
one needs to maintain a relationship between different views of an object.
Object Feature Detection
Object feature detection seeks to automatically determine vision sensor parameter
values for which particular features satisfy particular constraints when imaged.
These features belong to a known object in a known pose [27]. In addition to the
general survey on sensor planning, the authors lay specific emphasis on systems for
object feature detection systems. (A separate paper [28] presents the authors’ own
MVP system in detail.) A related topic is planning for complete sensor coverage of
3-D objects. A recent work in the area is that of Roberts and Marshall [29], who
present a viewpoint selection scheme for complete surface coverage of 3-D objects.
Some important earlier work in the area include those of Cowan and Kovesi [30],
Tarbox and Gottschlich [31] and Mason and Grun [32].
Active Object Recognition Systems
An active object recognition system uses multiple views of 3-D objects for recognition
in a purposive fashion. Based upon specialized representation scheme linking
multiple views of 3-D objects, different recognition schemes have been formulated
for active object recognition.
View Based Representation
Most active object recognition systems consider either of the following three representation
schemes, or their variants:
Appearance-based parametric eigenspaces
Multidimensional Receptive Field Histograms
Aspect graphs
These three are view-based – they encode information about different 2-D views
of a 3-D object. Breuel [35] describes simulations and experiments on real images
to suggest view based recognition as a robust and simple alternative to 3-D shapebased
recognition methods. In what follows, we describe the above three viewbased
representation schemes. We briefly point out their use in active recognition
systems – Section 2.3 describes them in detail.
Aspect Graphs
Aspect graphs are a popular representation tool for 3-D object recognition systems.
Koenderink and van Doorn [39] define aspects as topologically equivalent classes
of object appearances. Chakravarty and Freeman [40] adopt a similar approach in
their definition of the ‘Characteristic Views’, and their uses in object recognition.
Since sensors may be of different types (geometric, photometric, etc.), Ikeuchi and
co-workers generalize this definition – Object appearances may be grouped into
equivalence classes with respect to a feature set. These equivalence classes are aspects
[41]. Thus, an aspect is a collection of contiguous sites in viewpoint space
which correspond to the same set of features.
Conclusions
Sections 2 and 3 survey and analyze different active 3-D object recognition systems.
We repeat the process for different scene analysis systems (Sections 4 and 5) due
to the commonality of many issues in the two problems. Based on this survey and
analysis, we draw the following conclusions:
Geometric features are useful in a recognition task. We may supplement them
with other features such as colour and photometric information. Some recognition
systems are tightly coupled with the properties of the particular features they
use. However in some cases, we may have a system that is not explicitly based
on any particular set of features.
The 1-DOF (rotational) case between the object and an orthographic camera is
an important and fairly complex problem. The complexity of the recognition
task increases with the number of degrees of freedom between the object and the
camera, and the increasing generality of the camera model – from orthographic
to projective.
The knowledge representation scheme plays an important role in both generating
hypotheses corresponding to a given view, as well as in planning the next view.
Noise may corrupt the output of feature detectors used for analyzing a given
view. An important issue is accounting for noise at both the model-building
stage, as well as in the recognition phase.