15-01-2013, 03:45 PM
Image Super-Resolution Via Sparse Representation
Image Super-Resolution.pdf (Size: 1.76 MB / Downloads: 139)
Abstract—
This paper presents a new approach to single-image
superresolution, based upon sparse signal representation. Research
on image statistics suggests that image patches can be
well-represented as a sparse linear combination of elements from
an appropriately chosen over-complete dictionary. Inspired by
this observation, we seek a sparse representation for each patch
of the low-resolution input, and then use the coefficients of this
representation to generate the high-resolution output. Theoretical
results from compressed sensing suggest that under mild conditions,
the sparse representation can be correctly recovered from
the downsampled signals. By jointly training two dictionaries for
the low- and high-resolution image patches, we can enforce the
similarity of sparse representations between the low-resolution
and high-resolution image patch pair with respect to their own dictionaries.
Therefore, the sparse representation of a low-resolution
image patch can be applied with the high-resolution image patch
dictionary to generate a high-resolution image patch. The learned
dictionary pair is a more compact representation of the patch
pairs, compared to previous approaches, which simply sample a
large amount of image patch pairs [1], reducing the computational
cost substantially. The effectiveness of such a sparsity prior is
demonstrated for both general image super-resolution (SR) and
the special case of face hallucination. In both cases, our algorithm
generates high-resolution images that are competitive or even superior
in quality to images produced by other similar SR methods.
In addition, the local sparse modeling of our approach is naturally
robust to noise, and therefore the proposed algorithm can handle
SR with noisy inputs in a more unified framework.
Index Terms—Face hallucination, image super-resolution (SR),
nonnegative matrix factorization, sparse coding, sparse representation.
I. INTRODUCTION
SUPER-RESOLUTION (SR) image reconstruction is currently
a very active area of research, as it offers the promise
of overcoming some of the inherent resolution limitations of
Manuscript received September 17, 2009; revised January 24, 2010; accepted
low-cost imaging sensors (e.g., cell phone or surveillance cameras)
allowing better utilization of the growing capability of
high-resolution displays (e.g., high-definition LCDs). Such resolution-
enhancing technology may also prove to be essential in
medical imaging and satellite imaging where diagnosis or analysis
from low-quality images can be extremely difficult. Conventional
approaches to generating a SR image normally require
as input multiple low-resolution images of the same scene,
which are aligned with subpixel accuracy. The SR task is cast
as the inverse problem of recovering the original high-resolution
image by fusing the low-resolution images, based upon reasonable
assumptions or prior knowledge about the observation
model that maps the high-resolution image to the low-resolution
ones. The fundamental reconstruction constraint for SR is that
the recovered image, after applying the same generation model,
should reproduce the observed low-resolution images. However,
SR image reconstruction is generally a severely ill-posed
problem because of the insufficient number of low-resolution
images, ill-conditioned registration and unknown blurring operators,
and the solution from the reconstruction constraint is
not unique. Various regularization methods have been proposed
to further stabilize the inversion of this ill-posed problem, such
as [2]–[4].
However, the performance of these reconstruction-based SR
algorithms degrades rapidly when the desired magnification
factor is large or the number of available input images is small.
In these cases, the result may be overly smooth, lacking important
high-frequency details [5]. Another class of SR approach
is based upon interpolation [6]–[8]. While simple interpolation
methods such as Bilinear or Bicubic interpolation tend to generate
overly smooth images with ringing and jagged artifacts,
interpolation by exploiting the natural image priors will generally
produce more favorable results. Dai et al. [7] represented
the local image patches using the background/foreground
descriptors and reconstructed the sharp discontinuity between
the two. Sun et al. [8] explored the gradient profile prior for
local image structures and applied it to SR. Such approaches
are effective in preserving the edges in the zoomed image.
However, they are limited in modeling the visual complexity of
the real images. For natural images with fine textures or smooth
shading, these approaches tend to produce watercolor-like
artifacts.
A third category of SR approach is based upon machine
learning techniques, which attempt to capture the cooccurrence
prior between low-resolution and high-resolution image
patches. [9] proposed an example-based learning strategy
that applies to generic images where the low-resolution to
high-resolution prediction is learned via a Markov random field
(MRF) solved by belief propagation. [10] extends this approach
by using the Primal Sketch priors to enhance blurred edges,
ridges and corners. Nevertheless, the previously mentioned
methods typically require enormous databases of millions of
high-resolution and low-resolution patch pairs, and are therefore
computationally intensive. [11] adopts the philosophy of
locally linear embedding (LLE) [12] from manifold learning,
assuming similarity between the two manifolds in the high-resolution
and the low-resolution patch spaces. Their algorithm
maps the local geometry of the low-resolution patch space to
the high-resolution one, generating high-resolution patch as a
linear combination of neighbors. Using this strategy, more patch
patterns can be represented using a smaller training database.
However, using a fixed number K neighbors for reconstruction
often results in blurring effects, due to over- or under-fitting.
In our previous work [1], we proposed a method for adaptively
choosing the most relevant reconstruction neighbors based
upon sparse coding, avoiding over- or under-fitting of [11] and
producing superior results. However, sparse coding over a large
sampled image patch database directly is too time-consuming.
While the previously mentioned approaches were proposed
for generic image SR, specific image priors can be incorporated
when tailored to SR applications for specific domains such as
human faces. This face hallucination problem was addressed in
the pioneering work of Baker and Kanade [13]. However, the
gradient pyramid-based prediction introduced in [13] does not
directly model the face prior, and the pixels are predicted individually,
causing discontinuities and artifacts. Liu et al. [14]
proposed a two-step statistical approach integrating the global
PCA model and a local patch model. Although the algorithm
yields good results, the holistic PCA model tends to yield results
like the mean face and the probabilistic local patch model
is complicated and computationally demanding. Wei Liu et al.
[15] proposed a new approach based upon TensorPatches and
residue compensation. While this algorithm adds more details
to the face, it also introduces more artifacts.