21-07-2011, 03:30 PM
Abstract
We present a novel multi-view stereo method designed
for image-based rendering that generates piecewise planar
depth maps from an unordered collection of photographs.
First a discrete set of 3D plane candidates are computed
based on a sparse point cloud of the scene (recovered by
structure from motion) and sparse 3D line segments reconstructed
from multiple views. Next, evidence is accumulated
for each plane using 3D point and line incidence and
photo-consistency cues. Finally, a piecewise planar depth
map is recovered for each image by solving a multi-label
Markov Random Field (MRF) optimization problem using
graph-cuts. Our novel energy minimization formulation exploits
high-level scene information. It incorporates geometric
constraints derived from vanishing directions, enforces
free space violation constraints based on ray visibility of 3D
points and 3D lines and imposes smoothness priors specific
to planes that intersect.
We demonstrate the effectiveness of our approach on a
wide variety of outdoor and indoor datasets. The view interpolation
results are perceptually pleasing, as straight lines
are preserved and holes are minimized even for challenging
scenes with non-Lambertian and textureless surfaces.
1. Introduction
Significant progress has recently been made in solving
the problem of automatic feature matching and structure
from motion robustly, which allows us to recover camera
calibration and a sparse 3D structure of a scene from an
unordered collection of photographs [24]. However, the
problem of recovering a dense, photorealistic 3D model—
themulti-view stereo problem—arguably still remains unresolved.
While fully automatic stereo reconstruction systems
such as [15, 13] have shown great promise, the quality of
the generated models often suffer from various drawbacks.
Textureless and non-Lambertian surfaces in the scene give
rise to holes in the depth maps which must be interpolated in
some manner. This causes flat surfaces with straight lines to
appear bumpy and jaggies may also be present due to unreliable
matching in the presence of non-Lambertian surfaces,
occlusions etc. These problems frequently occur in architectural,
urban scenes, or in scenes containing man-made
objects where planar surfaces are quite common.
In this paper, we propose a new stereo method aimed
at recovering a dense, piecewise planar reconstruction of
the scene. For predominantly planar scenes, our piecewise
planar depth maps are accurate, compact, and plausible
enough for view interpolation between cameras with
wide baselines. During view interpolation, humans are sensitive
to the motion of high-contrast edges and straight lines
in the scene. Our approach aims at preserving such features
and minimizing parallax error, which produces perceptible
ghosting. The lack of surface detail is rarely noticeable during
viewpoint transitions between cameras.
1.1. Overview
Our approach starts by automatically matching features
and performing structure from motion on the input photographs
using an approach similar to [24], which recovers
the camera calibration and produces a sparse 3D point
cloud of the scene (Figure 1). This is followed by joint
multi-image vanishing point detection and reconstruction of
sparse 3D line segments. Next, a set of plane candidates is
estimated by robustly fitting planes to the sparse set of 3D
points and lines while using vanishing point cues for inferring
salient plane orientations (Section 3). Piecewise planar
depth maps are then recovered for each image by solving
a multi-label Markov Random Field (MRF) optimization
problem that involves assigning each pixel to one of the
candidate planes detected earlier (Section 4). Our graph-cut
based energy minimization takes into account various geometric
constraints that have not previously been explored
within MRF-based multi-view stereo, such as plane intersection
boundaries and free-space constraints. Our piecewise
planar depth maps contain planar polygonal segments
that are mostly free of discretization errors that are usually
present in regular grid based MRFs. The resulting
lightweight and compact geometric proxies are effective for
view interpolation between fairly wide baselines.
Download full report
http://www.googleurl?sa=t&source=web&cd=...ICCV09.pdf&ei=a_gnTo7DDtDIrQfR4r3BCQ&usg=AFQjCNHYjxYHF2BxilTbGaw5cAUnUQvY0Q&sig2=mgitj55v-eLgQ93j_nub8w