11-08-2012, 01:05 PM
Personalized Web Search For Improving Retrieval Effectiveness
Personalized Web Search.pdf (Size: 298.33 KB / Downloads: 44)
ABSTRACT
Current web search engines are built to serve all users, independent of the special needs of any individual
user. Personalization of web search is to carry out retrieval for each user incorporating his/her interests.
We propose a novel technique to learn user profiles from users’ search histories. The user profiles are
then used to improve retrieval effectiveness in web search. A user profile and a general profile are learned
from the user's search history and a category hierarchy respectively. These two profiles are combined to
map a user query into a set of categories, which represent the user's search intention and serve as a
context to disambiguate the words in the user's query. Web search is conducted based on both the user
query and the set of categories. Several profile learning and category mapping algorithms and a fusion
algorithm are provided and evaluated. Experimental results indicate that our technique to personalize web
search is both effective and efficient.
INTRODUCTION
As the amount of information on the Web increases rapidly, it creates many new challenges for
Web search. When the same query is submitted by different users, a typical search engine returns
the same result, regardless of who submitted the query. This may not be suitable for users with
different information needs. For example, for the query "apple", some users may be interested in
documents dealing with “apple” as “fruit”, while some other users may want documents related
to Apple computers. One way to disambiguate the words in a query is to associate a small set of
categories with the query. For example, if the category "cooking" or the category "fruit" is
associated with the query "apple", then the user's intention becomes clear. Current search engines
such as Google or Yahoo! have hierarchies of categories to help users to specify their intentions.
The use of hierarchical categories such as the Library of Congress Classification is also common
among librarians.
ALGORITHMS TO LEARN PROFILES
Learning a user profile (matrix M ) from the user’s search history (matrices DT and DC ) and
mapping user queries to categories can be viewed as a specific multi-class text categorization
task. In sections 3.1-3.3, we describe four algorithms to learn a user profile: bRocchio, LLSF,
pLLSF and kNN. The last three algorithms have been shown to be among the top-performance
text categorization methods in [Yang99].
IMPROVING RETRIEVAL EFFECTIVNESS
Our system maps each user query to a set of categories, and returns the top three categories. In
this section, we provide methods to improve retrieval effectiveness using categories as a context
of the user query. Three modes of retrieval have been briefly introduced in Section 2.6. In the
three modes of process, the user query is submitted to the search engine (in this case Google
Web Directory) multiple times. In the first mode, it is submitted to the search engine without
specifying any category. Let the list of documents retrieved be DOC-WO-C (documents retrieved
without specifying categories). Let its cardinality be MO. In the second and third modes, the
query is submitted by specifying a set of categories which is obtained either semi-automatically
or completely automatically. Let the list of documents retrieved by specifying the top i category
be DOC-W-Ci. Let its cardinality be MWi. MO is usually larger than MWi. As a consequence, a
fair comparison between retrieval using the specified categories and that of not specifying any
category is not possible. Our solution is as follows. We will merge the retrieved lists of
documents DOC-WO-C and DOC-W-Ci in such a way that the resulting set has exactly the same
cardinality as DOC-WO-C.
CONCLUSION
We described a strategy for personalization of web search: (1) a user's search history can be
collected without direct user involvement; (2) the user's profile can be constructed automatically
from the user's search history and is augmented by a general profile which is extracted
automatically from a common category hierarchy; (3) the categories that are likely to be of
interest to the user are deduced based on his/her query and the two profiles; and (4) these
categories are used as a context of the query to improve retrieval effectiveness of web search.
For the construction of the profiles, four batch learning algorithms (pLLSF, LLSF, kNN and
bRocchio) and an adaptive algorithm (aRocchio) are evaluated. Experimental results indicate that
the accuracy of using both profiles is consistently better than those using the user profile alone
and using the general profile alone. The simple adaptive algorithm aRocchio is also shown to be
effective and efficient. For the web search, the weighted voting-based merging algorithm is used
to merge retrieval results. The semi-automatic and automatic modes of utilizing categories
determined by our system are shown to improve retrieval effectiveness by 25.6% and around
12% respectively. We also show that our technique is efficient (at most 0.082 second/query).