03-08-2012, 10:24 AM
Fast Query Point Movement Techniques for Large CBIR Systems
Fast Query Point Movement Techniques for Large.pdf (Size: 539.1 KB / Downloads: 32)
Abstract
Target search in content-based image retrieval
(CBIR) systems refers to finding a specific (target) image such as
a particular registered logo or a specific historical photograph.
Existing techniques, designed around query refinement based on
relevance feedback, suffer from slow convergence, and do not
guarantee to find intended targets. To address these limitations,
we propose several efficient query point movement methods. We
prove that our approach is able to reach any given target image
with fewer iterations in the worst and average cases. We propose
a new index structure and query processing technique to improve
retrieval effectiveness and efficiency. We also consider strategies
to minimize the effects of users’ inaccurate relevance feedback.
Extensive experiments in simulated and realistic environments
show that our approach significantly reduces the number of
required iterations and improves overall retrieval performance.
The experimental results also confirm that our approach can
always retrieve intended targets even with poor selection of initial
query points.
Index Terms—Content-based image retrieval, relevance feedback,
target search, index structures.
I. INTRODUCTION
CONTENT-based image retrieval (CBIR) has received
much attention in the last decade, which is motivated by
the need to efficiently handle the immensely growing amount
of multimedia data. Many CBIR systems have been developed,
including QBIC [12], Photobook [27], MARS [26], [30],
NeTra [22], PicHunter [10], Blobworld [7], VisualSEEK [34],
SIMPLIcity [39] and others [2], [6], [9], [14], [17], [24], [32],
[38]. In a typical CBIR system, low-level visual image features
(e.g., color, texture and shape) are automatically extracted
for image descriptions and indexing purposes. To search for
desirable images, a user presents an image as an example of
similarity, and the system returns a set of similar images based
on the extracted features. In CBIR systems with relevance
feedback (RF), a user can mark returned images as positive or
negative, which are then fed back into the systems as a new,
refined query for the next round of retrieval. The process is
repeated until the user is satisfied with the query result. Such
systems are effective for many practical CBIR applications
[13].
There are two general types of image search: target search
and category search [10], [13]. The goal of target search is
to find a specific (target) image, such as a registered logo,
a historical photograph, or a particular painting. The goal of
Manuscript received April 30, 2007.
Danzhou Liu, Kien A. Hua, Khanh Vu, and Ning Yu are with the School of
EECS, University of Central Florida, Orlando, Florida 32816, USA. E-mail:
{dzliu, kienhua, khanh, nyu}[at]cs.ucf.edu.
2
s
1
t
r
Cluster 1 Cluster 2
Fig. 1. Local maximum trap in existing approaches
t
s
t
Fig. 2. Slow convergence in existing approaches
category search is to retrieve a given semantic class or genre
of images, such as scenery images or skyscrapers. In other
words, a user uses target search to find a known image. In
contrast, category search is used to find relevant images the
user might not be aware ahead of time. We focus on target
search in this paper.
Two orthogonal issues in CBIR research are efficiency and
accuracy. For instance, indexing techniques, such as R*-tree
[3], may improve the efficiency of the search process. Their
retrieval accuracy, however, depends on the effectiveness of
the visual features used to characterize the database images.
An effective CBIR system, therefore, needs to have both
an efficient search mechanism and an accurate set of visual
features. Addressing the effectiveness of the visual features is
beyond the scope of this paper. We assume that the Euclidean
distances between the images reflect their semantic similarity,
and focus on investigating new search techniques to improve
the efficiency of target search.
Existing target search techniques re-retrieve previously examined
images (i.e., those retrieved in the previous iterations)
when they again fall within the search range of the current
iteration. This strategy leads to the following disadvantages:
• No guarantee that the target can be found. The search
operation generally takes several iterations of relevance
feedback to examine a number of regions in the feature
space, before it reaches the target image. During this iter-
Digital Object Indentifier 10.1109/TKDE.2008.188 1041-4347/$25.00 © 2008 IEEE
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
2
ative process, the search advancement might get trapped
in a region as illustrated in Figure 1. It shows s and t as
the starting point ps and the target point pt, respectively.
Initially, the 3-NN search with ps as the query point yields
three points ps, p1, and p2 as the query result. Let us
say, the user marks points p1 and p2 as relevant. This
results in point pr, their centroid, as the new query point.
With pr as the refined query, the next 3-NN computation
again retrieves points p1, p2, and ps as the result. In this
scenario, the search process is trapped in this local region,
and can never reach the target point pt. Although, the
system can escape the local maximum trap with a larger
k, it is difficult to guess a proper threshold (k = 14 in this
example). Consequently, the user might not even know a
local maximum trap is occurring.
• Slow convergence. Including previously examined images
in the computation of the current centroid results in repeat
retrieval of some of the images. This prevents a more
aggressive movement of the search in the feature space.
This drawback is illustrated in Figure 2, where k = 3. It
shows that it takes six iterations for the search operation
to reach the target point pt. This slow convergence incurs
longer search time, and significant computation and disk
access overhead.
To address the aforementioned limitations, we propose four
target search methods: na¨ıve random scan (NRS), local neighboring
movement (LNM), neighboring divide-and-conquer
(NDC), and global divide-and-conquer (GDC) methods. All
these schemes are built around a common strategy: they do
not reexamine previously checked images. Furthermore, NDC
and GDC exploit Voronoi diagrams to aggressively prune
the search space in order to move faster towards the target
image. We formally prove that GDC and NDC converge
much faster than NRS and other methods can. Our extensive
experimental results confirm our complexity analysis,
and show the advantage of the proposed techniques in both
simulated and realistic environments. A preliminary version
of this study was presented in [21]. In the current paper,
we extend the original technique to consider inaccurate user
relevance feedback. Furthermore, we introduce a new index
structure and the corresponding query processing technique to
further improve performance. More experimental results are
also provided to make the study more complete.
The remainder of this paper is organized as follows. In
Section II, we survey related work on target search. The proposed
methods are presented in Section III in detail. Handling
inaccurate user relevance feedback is discussed in Section IV.
We introduce a new index structure and query processing
technique for target search in Section V. Our experimental
results are given in Section VI. Finally, we conclude the paper
in Section VII.
II. RELATED WORK
In this section, we survey existing techniques for target
search and category search. Category search techniques can
be used for target search if the desired category has only one
target image.
1. Original 2. Dimension Weighting 3. Generalized Weighting
Fig. 3. Single-point movement query shapes
1. Convex Shape 2. Concave Shape 3. Query Decomposition
Fig. 4. Multiple-point movement query shapes
Two well-known techniques for target search were proposed
in QBIC [12] and PicHunter [10]. IBM’s QBIC system allows
users to compose queries based on visual image features such
as color percentage, color layout, and texture present in the
target image, and ranks retrieved images according to those
criteria. QBIC, however, is not an RF technique, so that it
is difficult for users to define the ideal queries in the first
try (because this system does not allow them to refine their
queries as in recent RF systems). To lessen the burden on
users, PicHunter proposes to predict query’s intents by using
a Bayesian-based RF technique to guide query refinement
and target search. PicHunter’s performance, however, depends
on the consistency of users’ behavior and the accuracy of
the prediction algorithm. More importantly, both QBIC and
PicHunter do not guarantee to find target images and suffer
local maximum traps.
Techniques for category search can be divided into two
groups: single-point and multiple-point movement techniques.
A technique is classified as a single-point movement technique
if the refined query Qr at each iteration consists of only
one query point. Otherwise, it is a multiple-point movement
technique. Typical query shapes of single-point movement and
multiple-point movement techniques are shown in Figures 3
and 4, where the contours represent equi-similarity surfaces.
Single-point movement techniques, such as MARS [26], [30]
and MindReader [18], construct a single query point close to
relevant images and away from irrelevant ones. MARS uses
a weighted distance (producing shapes shown in Figure 3.2),
where each dimension weight is inversely proportional to the
standard deviation of the relevant images’ feature values in
that dimension. The rationale is that a small variation among
the values is more likely to express restrictions on the feature,
and thereby should carry a higher weight. On the other hand, a
large variation indicates this dimension is not significant in the
query, thus should assume a low weight. MindReader achieves
better results by using a generalized weighted distance, see
Figure 3.3 for its shape. Ostensive relevance feedback [5] can
be used to adjust the weights based on the checked images,
while the length of time since an image was checked is used
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
3
in a decay function to modulate the impact of those already
checked images.
In multiple-point movement techniques such as Query Expansion
[8], Qcluster [20], and Query Decomposition [17],
multiple query points are used to define the ideal space that
is most likely to contain relevant results. Query Expansion
groups query points into clusters and chooses their centroids
as Qr’s representatives (see Figure 4.1). The distance of a
point to Qr is defined as a weighted sum of individual distances
to those representatives. The weights are proportional
to the number of relevant objects in the clusters. Thus, Query
Expansion treats local clusters differently, as opposed to the
equal treatment in single-point movement techniques.
In some queries, clusters are too far apart for a unified,
all-encompassing contour to be effective; separate contours
can yield more selective retrieval. This observation motivated
Qcluster to employ an adaptive classification and clustermerging
method to determine optimal contour shapes for
complex queries. Qcluster supports disjunctive queries, where
similarity to any of the query points is considered as good,
see Figure 4.2. To handle disjunctive queries both in vector
space and in arbitrary metric space, a technique was proposed
in FALCON [40]. It uses an aggregate distance function to
estimate the (dis)similarity of an object to a set of desirable
images. To bridge the semantic gap more effectively, we
recently proposed Query Decomposition [17]. Based on the
user’s relevance feedback, this scheme automatically decomposes
a given query into localized subqueries, which more
accurately capture images with similar semantics but in very
different appearance (e.g., the front view and side view of
a car), see Figure 4.3. Other techniques [13], [36], [39] are
also available to address the semantic gap. This issue is
beyond the scope of this paper. In general, the above category
search techniques do not guarantee to find target images and
still suffer slow convergence, local maximum traps and high
computation overhead.
To avoid local maximum traps and their associated problems,
our methods will ignore all checked images. They will
be discussed in the order of their sophistication in the next
section. The most complex GDC is based on the singlepoint
movement method, which proves to converge faster than
multiple-point movement methods. GDC employs Voronoi
diagrams to prune irrelevant images, assisting users in query
refinement and speeding up convergence.
Unlike queries in traditional database systems, users in most
cases cannot specify an ideal query to retrieve the desired
result in multimedia database systems, and have to rely on
iterative feedback to refine their queries. Target search may
involve four typical types of queries: sampling queries [8],
[10], [12], [17], [18], [20], [21], [26], constrained sampling
queries [21], k-NN queries [18], [26] and constrained k-
NN queries [21] as discussed in Section III. Among all the
aforementioned techniques, only Chakrabarti et al. discussed
how to efficiently evaluated k-NN queries in the Query Expansion
model [8] for category search. They observed that
the refined queries in the Query Expansion model are not
modified dramatically from one iteration to another. Instead
of evaluating subsequent queries from scratch, they proposed
several techniques to save most of the disk I/O cost and CPU
cost by appropriately exploiting the information generated
during the previous iterations.
In related work, there have been many research efforts
in sampling for selectivity estimation [41], [42], real-time
CPU scheduling for mobile multimedia systems [43], and
efficiently evaluating k-NN queries [16], [19], [29], [37] and
constrained k-NN queries [11] without relevance feedback, but
much less has been reported on efficiently answering sampling
queries, constrained sampling queries and constrained k-NN
queries involved in target search, where we need to consider
iterative feedback and users’ inaccurate relevance feedback.
Most existing hierarchical index structures (e.g., R-tree [15],
R*-tree [3], and A-tree [31]) were not designed specifically
for target search, which typically cannot be answered in one
iteration and may require auxiliary information (e.g., sampling
points) to answer sampling queries and constrained sampling
queries. Collecting auxiliary information on the fly during each
feedback iteration causes overheads on CPU and disk I/O. In
summary, a new index structure and efficient query processing
technique for target search are highly demanded.