09-07-2012, 03:04 PM
Modern information retrieval
902333_Chapter04-Query languages.ppt (Size: 416.5 KB / Downloads: 55)
Keyword-based querying
Queries are combinations of words.
The document collection is searched for documents that contain these words.
Word queries are intuitive, easy to express and provide fast ranking.
The concept of word must be defined.
A word is a sequence of letters terminated by a separator (period, comma, blank, etc).
Definition of letter and separator is flexible; e.g., hyphen could be defined as a letter or as a separator.
Usually, “trivial words”(such as “a”, “the”, or “of”) are ignored.
Basic queries
Single-word queries:
A query is a single word
Simplest form of query.
All documents that include this word are retrieved.
Documents may be ranked by the frequency of this word in the document.
Boolean queries
Boolean queries. Describe the information needed by relating multiple words with Boolean operators.
Operators: and, or, except
except corresponds to and not
Semantics: For each query word w a corresponding set Dw is constructed that includes the documents that contain w.
The Boolean expression is then interpreted as an expression on the corresponding document sets with corresponding set operators:
and intersection
or union
except difference