13-11-2012, 01:49 PM
Efficient Extended Boolean Retrieval
ABSTRACT
Extended Boolean retrieval (EBR) models were
proposed nearly three decades ago, but have had little
practical impact, despite their significant advantages
compared to either ranked keyword or pure Boolean
retrieval. In particular, EBR models produce meaningful
rankings; their query model allows the representation of
complex concepts in an and-or format; and they are
scrutable, in that the score assigned to a document
depends solely on the content of that document,
unaffected by any collection statistics or other external
factors. These characteristics make EBR models
attractive in domains typified by medical and legal
searching, where the emphasis is on iterative
development of reproducible complex queries of dozens
or even hundreds of terms. However, EBR is much more
computationally expensive than the alternatives. We
consider the implementation of the p-norm approach to
EBR, and demonstrate that ideas used in the max-score
and wand exact optimization techniques for ranked
keyword retrieval can be adapted to allow selective
bypass of documents via a low-cost screening process
for this and similar retrieval models. We also propose
term independent bounds that are able to further reduce
the number of score calculations for short, simple
queries under the extended Boolean retrieval model.
Together, these methods yield an overall saving from 50
to 80 percent of the evaluation cost on test queries
drawn from biomedical search.