04-09-2012, 01:21 PM
Ranking Spatial Data by Quality Preferences
Ranking Spatial Data.pdf (Size: 694.92 KB / Downloads: 46)
Abstract
A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example,
using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the appropriateness of
their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital, market, etc.) within their spatial
neighborhood. Such a neighborhood concept can be specified by the user via different functions. It can be an explicit circular region
within a given distance from the flat. Another intuitive definition is to consider the whole spatial domain and assign higher weights to
the features based on their proximity to the flat. In this paper, we formally define spatial preference queries and propose appropriate
indexing techniques and search algorithms for them. Extensively evaluation of our methods on both real and synthetic data reveal that
an optimized branch-and-bound solution is efficient and robust with respect to different parameters.
INTRODUCTION
Spatial database systems manage large collections of
geographic entities, which apart from spatial attributes
contain non-spatial information (e.g., name, size, type,
price, etc.). In this paper, we study an interesting type of
preference queries, which select the best spatial location
with respect to the quality of facilities in its spatial
neighborhood.
Given a set D of interesting objects (e.g., candidate
locations), a top-k spatial preference query retrieves the
k objects in D with the highest scores. The score of an
object is defined by the quality of features (e.g., facilities
or services) in its spatial neighborhood. As a motivating
example, consider a real estate agency office that holds
a database with available flats for lease. Here “feature”
refers to a class of objects in a spatial map such as
specific facilities or services. A customer may want to
rank the contents of this database with respect to the
quality of their locations, quantized by aggregating nonspatial
characteristics of other features (e.g., restaurants,
cafes, hospital, market, etc.) in the spatial neighborhood
of the flat (defined by a spatial range around it). Quality
may be subjective and query-parametric. For example,
a user may define quality with respect to non-spatial
attributes of restaurants around it (e.g., whether they
serve seafood, price range, etc.).
BACKGROUND AND RELATED WORK
Object ranking is a popular retrieval task in various applications.
In relational databases, we rank tuples using
an aggregate score function on their attribute values [2].
For example, a real estate agency maintains a database
that contains information of flats available for rent. A
potential customer wishes to view the top-10 flats with
the largest sizes and lowest prices. In this case, the score
of each flat is expressed by the sum of two qualities:
size and price, after normalization to the domain [0,1]
(e.g., 1 means the largest size and the lowest price). In
spatial databases, ranking is often associated to nearest
neighbor (NN) retrieval. Given a query location, we are
interested in retrieving the set of nearest objects to it that
qualify a condition (e.g., restaurants). Assuming that the
set of interesting objects is indexed by an R-tree [3], we
can apply distance bounds and traverse the index in a
branch-and-bound fashion to obtain the answer [4].
Spatial Query Evaluation on R-trees
The most popular spatial access method is the R-tree [3],
which indexes minimum bounding rectangles (MBRs) of
objects. Figure 2 shows a set D = fp1; : : : ; p8g of spatial
objects (e.g., points) and an R-tree that indexes them.
R-trees can efficiently process main spatial query types,
including spatial range queries, nearest neighbor queries,
and spatial joins. Given a spatial region W, a spatial range
query retrieves from D the objects that intersect W. For
instance, consider a range query that asks for all objects
within the shaded area in Figure 2. Starting from the
root of the tree, the query is processed by recursively
following entries, having MBRs that intersect the query
region.
Branch and Bound Algorithm
GP is still expensive as it examines all objects in D and
computes their component scores. We now propose an
algorithm that can significantly reduce the number of
objects to be examined. The key idea is to compute, for
non-leaf entries e in the object tree D, an upper bound
T (e) of the score (p) for any point p in the subtree of
e. If T (e)
, then we need not access the subtree of e,
thus we can save numerous score computations.
CONCLUSION
In this paper, we studied top-k spatial preference queries,
which provides a novel type of ranking for spatial objects
based on qualities of features in their neighborhood. The
neighborhood of an object p is captured by the scoring
function: (i) the range score restricts the neighborhood
to a crisp region centered at p, whereas (ii) the influence
score relaxes the neighborhood to the whole space and
assigns higher weights to locations closer to p.
We presented five algorithms for processing top-k
spatial preference queries. The baseline algorithm SP
computes the scores of every object by querying on
feature datasets. The algorithm GP is a variant of SP that
reduces I/O cost by computing scores of objects in the
same leaf node concurrently.