06-07-2012, 10:20 AM
BinRank: Scaling Dynamic Authority-Based Search Using Materialized Subgraphs
Scaling Dynamic Authority.pdf (Size: 2.52 MB / Downloads: 52)
INTRODUCTION
THE PageRank algorithm [1] utilizes the Web graph link
structure to assign global importance to Web pages. It
works by modeling the behavior of a “random Web surfer”
who starts at a random Web page and follows outgoing
links with uniform probability. The PageRank score is
independent of a keyword query. Recently, dynamic
versions of the PageRank algorithm have become popular.
They are characterized by a query-specific choice of the
random walk starting points. In particular, two algorithms
have got a lot of attention: Personalized PageRank (PPR) for
Web graph data sets [2], [3], [4], [5] and ObjectRank for
graph-modeled databases [6], [7], [8], [9], [10].
RELATED WORK
The issue of scalability of PPR [3] has attracted a lot of
attention. PPR performs a very expensive fixpoint iterative
computation over the entire graph, while it generates
personalized search results. To avoid the expensive iterative
calculation at runtime, one can naively precompute and
materialize all the possible personalized PageRank vectors
(PPVs). Although this method guarantees fast user response
time, such precomputation is impractical as it requires a
huge amount of time and storage especially when done on
large graphs. In this section, we examine hub-based and
Monte Carlo style methods that address the scalability
problem of PPR, and give an overview of HubRank [8]
that integrates the two approaches to improve the scalability
of ObjectRank. Even though these approaches
enabled PPR to be executed on large graphs, they either
limit the degree of personalization or deteriorate the quality
of the top-k result lists significantly.
Data Model
ObjectRank performs top-k relevance search over a database
modeled as a labeled directed graph. The data graph GðV ;EÞ
models objects in a database as nodes, and the semantic
relationships between them as edges. A node v 2 V contains
a set of keywords and its object type. For example, a paper in
a bibliographic database can be represented as a node
containing its title and labeled with its type, “paper.” A
directed edge e 2 E from u to v is labeled with its relationship
type ðeÞ. For example, when a paper u cites another
paper v, ObjectRank includes in E an edge e ¼ ðu ! vÞ that
has a label “cites.” It can also create a “cited by”—type edge
from v to u. In ObjectRank, the role of edges between objects
is the same as that of hyperlinks between Web pages in
PageRank. However, note that edges of different edge types
may transfer different amounts of authority. By assigning
different edge weights to different edge types, ObjectRank
can capture important domain knowledge such as “a paper
cited by important papers is important, but citing important
papers should not boost the importance of a paper.” Let wðtÞ
denote the weight of edge type t. ObjectRank assumes that
weights of edge types are provided by domain experts.
Query Processing
For a given query, ObjectRank returns top-k objects relevant
to the query. We first describe the intuition behind
ObjectRank, introduce the ObjectRank equation, and then,
elaborate on important calibration factors.
ObjectRank query processing can be illustrated using the
random surfer model. A random surfer starts from a random
node vi among nodes that contain the given keyword. These
random surfer starting points are called a base set. For a given
keyword t, the keyword base set of t, BSðtÞ, consists of nodes
in which t occurs. Note that any node in G can be part of the
base set, which makes ObjectRank support the full degree of
personalization.