01-08-2012, 11:37 AM
Summarization-based Query Expansion in Information Retrieval
Summarization-based Query Expansion.pdf (Size: 720.17 KB / Downloads: 41)
Abstract
We discuss a semi-interactive approach to information
retrieval which consists of two tasks performed
in a sequence. First, the system assists
the searcher in building a comprehensive statement
of information need, using automatically generated
topical summaries of sample documents. Second,
the detailed statement of information need is automatically
processed by a series of natural language
processing routines in order to derive an optimal
search query for a statistical information retrieval
system. In this paper, we investigate the role of automated
document summarization in building effective
search statements. We also discuss the results
of latest evaluation of our system at the annual Text
Retrieval Conference (TI~EC).
Information Retrieval
Information retrieval (IR) is a task of selecting documents
from a database in response to a user's query,
and ranking flmm according to relevance. This has
been usually accomplished using statistical methods
(often coupled with manual encoding) that (a) select
terms (words, phrases, and other units) from documents
that are deemed to best represent their content,
and (b) create an inverted index file (or files)
that provide an easy access to documents containing
these terms. A subsequent search process attempts
to match preprocessed user queries against termbased
representations of documents in each case determining
a degree of relevance between the two
which depends upon the number and types of matching
terms.
Building effective search queries
We have been experimenting with manual and automatic
natural language query (or topic, in TREC
parlance) building techniques. This differs from
most query modification techniques used in Il~ in
that our method is to reformulate the user's statement
of information need rather than the search system's
internal representation of it, as relevance feedback
does. Our goal is to devise a method of fulltext
expansion that would allow for creating exhaustive
search topics such that:
Deriving automatic summaries
Each component of a summary DMS needs to be instantiated
by one or more passages extracted from
the original text. Initially, all eligible passages (i.e.,
explicitly delineated paragraphs) within a document
are potential candidates for the summary. As we
move through text, paragraphs are scored for their
summary-worthiness. The final score for each passage,
normalized for its length, is a weighted sum
of a number of minor scores, using ~he following
formula: ~
Implemeniiaiiion and evaluation
The summarizer has been implemented as a demons~
ra~ion system, primarily for news summarization.
In general we are quite pleased with the system's
performance. The summarizer is domain independent,
and can effectively process a range of types
of documents. The summaries are quite informative
with excellent readability. They are also quite short,
generally only 5 to 10% of the original text and can
be read and understood very quickly.