08-02-2013, 10:44 AM
Metric Rectification of Curved Document Images
Metric Rectification.pdf (Size: 5.1 MB / Downloads: 28)
Abstract
In this paper, we propose a metric rectification method to restore an image from a single camera-captured document
image. The core idea is to construct an isometric image mesh by exploiting the geometry of page surface and camera. Our method
uses a general cylindrical surface (GCS) to model the curved page shape. Under a few proper assumptions, the printed horizontal text
lines are shown to be line convergent symmetric. This property is then used to constrain the estimation of various model parameters
under perspective projection. We also introduce a paraperspective projection to approximate the nonlinear perspective projection. A
set of close-form formulas is thus derived for the estimate of GCS directrix and document aspect ratio. Our method provides a
straightforward framework for image metric rectification. It is insensitive to camera positions, viewing angles, and the shapes of
document pages. To evaluate the proposed method, we implemented comprehensive experiments on both synthetic and real-captured
images. The results demonstrate the efficiency of our method. We also carried out a comparative experiment on the public
CBDAR2007 data set. The experimental results show that our method outperforms the state-of-the-art methods in terms of OCR
accuracy and rectification errors.
INTRODUCTION
NOWADAYS, digital cameras are widely used in the
optical character recognition (OCR) community for
capturing the images of documents. They exhibit various
advantages against flatbed scanners. For instance, they are
portable, fast responsive, and able to capture document
images from any viewpoint. Furthermore, they offer a
contactless way for capturing historical documents that are
fragile and cannot be pressed onto a flatbed scanner.
The convenience of using digital cameras, however, is
also accompanied by some serious problems. For example,
when one uses a digital camera to capture an opened book
page, the resulting image is distorted due to the curved
page surface and the perspective of camera. The distortions
will not only impair the visual quality of images, but also
may cause significant problems in the subsequent processing
steps, such as page layout segmentation and character
recognition. Consequently, geometric rectification is often
an indispensable step in camera-based document image
analysis and recognition.
Previous Works
Nonlinear Image Transformation
A straightforward idea for document image rectification is
to segment the curved words or text lines and straighten
them one by one [1], [3]. These methods can produce
satisfactory outputs that are suitable for OCR applications.
But the drawback is also obvious. Due to the locality of
image transformation, these methods may fail to rectify the
distortions within nontextual regions.
One extension is to employ a global image transformation.
Lu and Tan [20] propose to estimate an image transformation
through image grid modeling. A grid regularization process
is performed to remove the distortions. Schneider et al. [21]
derive a warping mesh by interpolating a vector field using
image local orientation features. The image is finally rectified
by approximating the nonlinear distortions with multiple
linear projections. Brown and Tsoi [2], [4] describe a
boundary interpolation framework to approximate the image
warping function. Their method is applicable to a variety of
geometric distortions.
Shape Estimation from Range Data
A direct way to acquire the 3D shape information of a
document page is to take advantage of its 3D range data. Pilu
[5] proposes to use a developable surface to fit the range data
and then flatten it to produce an undistorted image. Brown
et al. [6], [8], [22] first acquire a 3D scan of the document’s
surface together with a 2D high-resolution image. Then, a
conformal mapping is used to flatten the surface.
The methods using 3D range data generally do not
assume a parametric shape model. Thus, they are quite
suitable for rectifying images with arbitrary geometric
distortions [6], [7], [8], [22]. However, the requirements for
expensive setups, as well as the limitations in speed and
stability, often make them less attractive in applications.
Shape-from-X
Shape-from-shading techniques are often used to extract 3D
shape information from a shading image. These techniques
have been successfully applied to the geometric rectification
of document images, e.g., [9], [11], [12], [13], [23].
However, due to the strict assumptions on the environment
lighting, these techniques can merely provide a rough
qualitative shape estimation that is often insufficient for a
metric image rectification.
Our Contributions
In this paper, we present a novel metric rectification method
for restoring an image from a single document image
captured by an uncalibrated camera. The method employs a
General Cylindrical Surface (GCS) to model the page shape.
Then, the properties of horizontal text lines under perspective
projection are exploited to build an isometric mesh.
Finally, mesh-based image warping is implemented to
remove the distortions.
In comparison with the previous approaches [2], [4], [14],
our method removes the restrictions on camera poses. It is
insensitive to the viewing angle and positions of camera.
Therefore, it enables a flexible application and can be
applied to document images captured at very close range.
These images generally have strong perspective and cannot
be handled well by the previous methods.
APPROACH
Our goal is to fully rectify the nonlinear geometric
distortions in a camera-captured document image, including
the distortions caused by perspective, page curl, and
their coupling. To this end, a crucial step is to construct an
isometric image mesh, which accounts for the underlying
geometric distortions. Once such a mesh is available, one
can conveniently flatten the curved document image
through mesh-based image warping [28]. The overview of
the proposed method is shown in Fig. 1.
Text Lines Extraction and B-Spline Fitting
To construct an isometric image mesh, we have to first
extract at least two curved horizontal text lines. Some
methods already exist in the literature, e.g., [1], [3], [29].
Recently, Bukhari et al. [30] use an active contour model
(named coupled-snakes) to extract curled text lines information.
Their method is less sensitive to different
directions of curl and variable line spacing. Lu and Tan
[20] use a point tracing technique [31] to estimate the
baseline of a curved text line. The method can produce a
highly accurate estimation result.
Mesh Warping and Image Flattening
Once an isometric mesh is made, the curved image can be
fully rectified through mesh-based image warping, which is
implemented via a bivariate warping function [4]. Fig. 10
illustrates several examples of image rectification results.
In the top row of the figure, the leftmost one is a
synthetic image and the others are real-captured images.
We can observe that the geometric distortions in the images
caused by page curl and perspective are removed satisfactorily.
For a detailed comparison, we enlarge several image
patches in the input images and show their corresponding
rectifications side by side in Fig. 11.
EXPERIMENTAL RESULTS AND ANALYSIS
We evaluate the proposed method through three experiments:
The first experiment is implemented on synthetic
images. In contrast to real-captured images, the ground
truths of synthetic images are exactly known. Thus, they are
quite suitable for the quantitative evaluations of image
rectifications and model parameters estimation. The second
experiment is carried out on document images captured
from real book pages. In addition to nonlinear geometric
distortions, these images often suffer from defocus.