19-07-2012, 01:31 PM
One Size Does Not Fit All: Towards User- and Query-Dependent Ranking For Web Databases
One Size Does Not Fit All Towards User- and Query-Dependent Ranking For Web Databases.pdf (Size: 855.41 KB / Downloads: 44)
INTRODUCTION
The emergence of the deep Web [7] [9] has led to the
proliferation of a large number of Web databases for a variety
of applications (e.g., airline reservations, vehicle search, real
estate scouting). These databases are typically searched by
formulating query conditions on their schema attributes. When
the number of results returned is large, it is time-consuming
to browse and choose the most useful answer(s) for further
investigation. Currently, Web databases simplify this task by
displaying query results sorted on the values of a single
attribute (e.g., Price, Mileage, etc.). However, most Web users
would prefer an ordering derived using multiple attribute
values, which would be closer to their expectation.
RELATED WORK
Although there was no notion of ranking in traditional
databases, it has existed in the context of information retrieval
for quite some time. With the advent of the Web, ranking
gained prominence due to the volume of information being
searched/browsed. Currently, ranking has become ubiquitous
and is used in document retrieval systems, recommender
systems, Web search/browsing, and traditional databases as
well. Below, we relate our effort to earlier work in these areas.
Ranking in Recommendation Systems: Given the notion
of user- and query-similarity, it appears that our proposal is
similar to the techniques of collaborative [17] [5] [8] and
content filtering [4] [6] [14] used in recommendation systems.
However, there are some important differences (between ranking
tuples for database queries versus recommending items in
a specific order) that distinguish our work.
Ranking Architecture
The Similarity model (shown in Figure 1) forms the core
component of our ranking framework. When the user Ui
poses the query Qj, the query-similarity model determines
the set of queries ({Qj,Q1,Q2, ...,Qp}) most similar to Qj.
Likewise, the user-similarity model determines the set of
users ({Ui, U1, U2, ...Ur}) most similar to Ui. Using these
two ordered sets of similar queries and users, it searches
the workload to identify the function FUxQy such that the
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
combination of Ux and Qy is most similar to Ui and Qj.
FUxQy is then used to rank Qj’s results for Ui.
SIMILARITY MODEL FOR RANKING
The concept of similarity-based ranking is aimed at situations
when the ranking functions are known for a small (and
hopefully representative) set of user-query pairs. At the time
of answering a query asked by a user, if no ranking function
is available for this user-query pair, the proposed queryand
user-similarity models can effectively identify a suitable
function to rank the corresponding results.
Quality Evaluation
Query Similarity: Based on the two proposed models of
query similarity (Section 4.1.1 and 4.1.2), in the absence of a
function Fij for a user-query pair (Ui, Qj), the most similar
query (Qc and Qr using the query-condition and the queryresult
model respectively) asked by Ui, for which a function
(Fic and Fir respectively) exists in the workload, is selected
and the corresponding function is used to rank Qj ’s results.
We test the quality of both query similarity models as
follows: We rank Qj ’s results (Nj) using Fic and Fir respectively,
and obtain two sets of ranked results (R and R).
We then use the original (masked) function Fij to rank Nj
and obtain the set ®.