20-12-2012, 06:53 PM
COST-EFFECTIVE CREATION OF SPECIALIZED SEARCH ENGINES
COST-EFFECTIVE CREATION OF SPECIALIZED.pdf (Size: 554.86 KB / Downloads: 19)
Introduction and Motivation
Background
Using the World Wide Web for problem solving by utilizing web search engines has become
increasingly commonplace. However, the problem of poor precision when searching for specialized
information (e.g., buying guides) using today’s major web search engines (e.g., Yahoo! 1, Google 2) is
well known in the literature [59], [33], and [53].
A widely accepted de-facto standard for web search is a simple query form that accepts keywords
and returns as a result document locators (URLs 3). In general, keyword based query articulation
is difficult and requires a substantial learning effort. Therefore typical queries are short, comprising an
average of two to three terms [83], [72] per query. When searching for specialized information the problem
of query formulation becomes harder: A user then has to transfer a complex information need into a
limited set of keywords.
Observing Users Searching for Specialized Information
We performed a small experiment where we asked some users to search for buying guides on
various topics (e.g., digital cameras). We then observed the users on how they approached that task. Our
anecdotal investigation indicates that many expert web users can utilize the following query formulation
approach that can be broken down into four types of behaviors:
Doc-type expansion: Adding document type or genre related terms to the query. With document type
we refer not to the document’s file or mime type, but rather to its intent. In the case of the “buying
guide” document type, for example, such terms could include buying guide or feature overview.
Doc-type expansion is effective for mining search engines for documents within a specific document
type.
Thesis Statement
We have described above the major specialization approaches for web search. The major
problem we identified with specialized web search solutions is that they are very expensive to build, and
require a high level of expertise combined with a significant development effort.
This thesis focuses on one approach: Specialization by document type. The primary objective
of this dissertation is to build a framework and tools with which an advanced web developer or consultant
can build a search engine specialized by document type within a few man-weeks of effort. Once this is
accomplished our secondary goal is to show how the proposed framework could be more broadly applied.
Therefore we offer an in depth study of contextual search and its application as another dimension of
specialized search applied within the proposed framework.