10-09-2016, 10:39 AM
Blobworld: Image Segmentation Using
Expectation-Maximization and
Its Application to Image Querying
1454320207-CBGMblobworld.pdf (Size: 3.05 MB / Downloads: 4)
AbstractÐRetrieving images from large and varied collections using image content as a key is a challenging and important problem.
We present a new image representation that provides a transformation from the raw pixel data to a small set of image regions that are
coherent in color and texture. This ªBlobworldº representation is created by clustering pixels in a joint color-texture-position feature
space. The segmentation algorithm is fully automatic and has been run on a collection of 10,000 natural images. We describe a system
that uses the Blobworld representation to retrieve images from this collection. An important aspect of the system is that the user is
allowed to view the internal representation of the submitted image and the query results. Similar systems do not offer the user this view
into the workings of the system; consequently, query results from these systems can be inexplicable, despite the availability of knobs
for adjusting the similarity metrics. By finding image regions that roughly correspond to objects, we allow querying at the level of objects
rather than global image properties. We present results indicating that querying for images using Blobworld produces higher precision
than does querying using color and texture histograms of the entire image in cases where the image contains distinctive objects.
INTRODUCTION
VERY large collections of images are growing ever more
common. From stock photo collections and proprietary
databases to the World Wide Web, these collections are
diverse and often poorly indexed; unfortunately, image
retrieval systems have not kept pace with the collections
they are searching. The limitations of these systems include
both the image representations they use and their methods
of accessing those representations to find images:
. While users generally want to find images containing
particular objects (ªthingsº) [9], [13], most existing
image retrieval systems represent images based only
on their low-level features (ªstuffº), with little regard
for the spatial organization of those features.
. Systems based on user querying are often unintuitive
and offer little help in understanding why certain
images were returned and how to refine the query.
Often the user knows only that he has submitted a
query for, say, a bear but in return has retrieved many
irrelevant images and very few pictures of bears In this paper, we present ªBlobworld,º a new framework
for image retrieval based on segmentation into regions and
querying using properties of these regions. The regions
generally correspond to objects or parts of objects. While
Blobworld does not exist completely in the ªthingº domain, it
recognizes the nature of images as combinations of objects,
and querying in Blobworld is more meaningful than it is with
simple ªstuffº representations.
Image segmentation is a difficult problem. Segmentation
algorithms inevitably make mistakes, causing some degradation
in performance of any system that uses the segmentation
results. As a result, designers of image retrieval systems have
generally chosen to use global image properties, which do not
depend on accurate segmentation. However, segmenting an
image allows us to access the image at the level of objects. We
believe this ability is critical to image retrieval and to progress
in object recognition, in general. We have developed a
segmentation algorithm that, while imperfect, provides
segmentations that are good enough to yield improved query
performance compared to systems that use global properties.
In order to segment each image automatically, we model
the joint distribution of color, texture, and position features
with a mixture of Gaussians. We use the ExpectationMaximization
(EM) algorithm [8] to estimate the parameters
of this model; the resulting pixel-cluster memberships
provide a segmentation of the image. After the image is
segmented into regions, a description of each region's color
and texture characteristics is produced. In a querying task, the
user can access the regions directly, in order to see the
segmentation of the query image and specify which aspects of
the image are important to the query. When query results are
returned, the user also sees the Blobworld representation of
each retrieved image; this information assists greatly in
refining the query.
We begin this paper by briefly discussing the current state
of image retrieval. In Sections 2 and 3, we describe the feature
extraction and segmentation algorithm. In Section 4, we
discuss the descriptors assigned to each region. In Section 5,
we present a query system based on Blobworld, as well as
results from queries in a collection of 10,000 highly varied
natural images. We conclude with a brief discussion of our
approach and some proposed directions for future work.
Portions of this work have been published in [4], [6], [7].
1.1 Related Work
The best-known image database system is IBM's Query by
Image Content (QBIC) [10], which allows an operator to
specify various properties of a desired image. The system
then displays a selection of potential matches to those
criteria, sorted by a score of the appropriateness of the
match. Region segmentation is largely manual, but recent
versions of QBIC [2] contain simple automated segmentation
facilities. Photobook [32] incorporates more sophisticated
representations of texture and a degree of automatic
segmentation. Other examples of systems that identify
materials using low-level image properties include Virage
[17], VisualSEEk [39], Candid [24], and Chabot [30].
Color histograms [42], [43] are commonly used in image
retrieval systems and have proven useful; however, this
global characterization lacks information about how the
color is distributed spatially. Several researchers have
attempted to overcome this limitation by incorporating
spatial information in the descriptor. Stricker and Dimai
[41] store the average color and the color covariance matrix
within each of five fuzzy image regions. Huang et al. [20]
store a ªcolor correlogramº that encodes the spatial correlation
of color-bin pairs. Smith and Chang [40] store the
location of each color that is present in a sufficient amount in
regions computed using histogram backprojection.
Lipson et al. [26] retrieve images based on spatial and
photometric relationships within and across simple image
regions. Little or no segmentation is done; the regions are
derived from low-resolution images. Jacobs et al. [21] use
multiresolution wavelet decompositions to perform queries
based on iconic matching.
Some of these systems encode information about the
spatial distribution of color features, and some perform
simple automatic or manually-assisted segmentation. However,
none provides the level of automatic segmentation
and user control necessary to support object queries in a
very large image collection. These approaches generally
work well in a query-by-example task when the entire scene
is distinctive and relevant; they are not suited to the task of
querying for general objects such as animals, where large
parts of the scene are irrelevant.
Our approach is most similar to Ma and Manjunath [27]
who perform retrieval based on segmented image regions.
Their segmentation algorithm includes an optional manual
region-pruning step, and the user must specify the expected
number of regions. Whether this is a pro or a con is not clear; it is still an open question what effect the number of
regions has on system performance.
Classical object recognition techniques usually rely on
clean segmentation of the object from the rest of the image or
are designed for fixed geometric objects such as machine
parts. Neither constraint holds in the case of natural images:
the shape, size, and color of objects like tigers and airplanes
are quite variable, and segmentation is imperfect. Clearly,
classical object recognition does not apply. More recent
techniques [33] can identify specific objects drawn from a
finite (on the order of 100) collection, but no present technique
is effective at the general image analysis task, which requires
both image segmentation and image classification.
Our approach to segmentation uses the EM algorithm to
estimate the parameters of a mixture of Gaussians model of
the joint distribution of pixel color and texture features. This
approach is related to earlier work using EM and/or the
Minimum Description Length (MDL) principle to perform
segmentation based on motion [3], [44] or scaled intensities
[45]. Related approaches such as deterministic annealing
[34] and classical clustering [22] have been applied to
texture segmentation without color. Panjwani and Healey
[31] have performed segmentation using a Markov random
field color texture model.
2 FEATURE EXTRACTION
Creating the Blobworld representation of an image involves
three steps (see Fig. 1):
1. Select an appropriate scale for each pixel and extract
color, texture, and position features for that pixel at
the selected scale.
2. Group pixels into regions by modeling the distribution
of pixel features with a mixture of Gaussians
using Expectation-Maximization.
3. Describe the color distribution and texture of each
region for use in a query.
Fig. 2 illustrates these steps for a sample image.
2.1 Extracting Color Features
Each image pixel has a three-dimensional color descriptor
in the L*a*b* color space. This color space is approximately
perceptually uniform; thus, distances in this space are
meaningful [47]. We smooth the color features as discussed
in Section 2.3 in order to avoid oversegmenting regions
such as tiger stripes based on local color variation;
otherwise, each stripe would become its own region.