20-06-2014, 02:58 PM
PMSE: A Personalized Mobile Search Engine
1389242570-PMSEAPersonalizedMobileSearchEngine.pdf (Size: 108.05 KB / Downloads: 96)
Abstract
We propose a personalized mobile search engine, PMSE, that captures
the users’ preferences in the form of concepts by mining their
clickthrough data. Due to the importance of location information in
mobile search, PMSE classifies these concepts into content concepts
and location concepts. In addition, users’ locations (positioned by GPS)
are used to supplement the location concepts in PMSE. The user
preferences are organized in an ontology-based, multi-facet user
profile, which are used to adapt a personalized ranking function for
rank adaptation of future search results. To characterize the diversity of
the concepts associated with a query and their relevances to the users
need, four entropies are introduced to balance the weights between
the content and location facets. Based on the client-server model, we
also present a detailed architecture and design for implementation of
PMSE. In our design, the client collects and stores locally the
clickthrough data to protect privacy, whereas heavy tasks such as
INTRODUCTION
A major problem in mobile search is that the interactions between the
users and search engines are limited by the small form factors of the
mobile devices. As a result, mobile users tend to submit shorter, hence,
more ambiguous queries compared to their web search counterparts.
In order to return highly relevant results to the users, mobile search
engines must be able to profile the users’ interests and personalize the
search results according to the users’ profiles. A practical approach to
capturing a user’s interests for personalization is to analyze the user’s
clickthrough data [5], [10], [15], [18]. Leung, et. al., developed a search
engine personalization method based on users’ concept preferences
and showed that it is more effective than methods that are based on
RELATED-WORK
Clickthrough data has been used in determining the users’ preferences
on their search results. Table 1, showing an example clickthrough data
for the query “hotel”, composes of the search results and the ones that
the user clicked on (bolded search results in Table 1). As shown, ci’s are
the content concepts and li’s are the location concepts extracted from
SYSTEM-DESIGN
Figure 1 shows PMSE’s client-server architecture, which meets three
important requirements. First, computation intensive tasks, such as
RSVM training, should be handled by the PMSE server due to the
limited computational power on mobile devices. Second, data
transmission between client and server should be minimized to ensure
fast and efficient processing of the search. Third, clickthrough data,
representing precise user preferences on the search results, should be
stored on the PMSE clients in order to preserve user privacy. In the
PMSE’s client-server architecture, PMSE clients are responsible for
storing the user clickthroughs and the ontologies derived from the
PMSE server. Simple tasks, such as
updating clickthoughs and ontologies, creating feature vectors, and
USER-INTEREST-PROFILING
PMSE uses “concepts” to model the interests and preferences of a user.
Since location information is important in mobile search, the concepts
are further classified into two different types, namely, content concepts
and location concepts. The concepts are modeled as ontologies, in
order to capture the relationships between the concepts. We observe
that the characteristics of the content concepts and location concepts
are different. Thus, we propose two different techniques for building
the content ontology (in Section 4.1) and location ontology (in Section
4.2). The ontologies indicate a possible
concept space arising from a user’s queries, which are maintained along
with the clickthrough data for future preference adaptation. In PMSE,
we adopt ontologies to model the concept space because they not only
can represent concepts but also capture the relationships between
DIVERSITY-AND-CONCEPT-ENTROPY
PMSE consists of a content facet and a location facet. In order to
seamlessly integrate the preferences in these two facets into one
coherent personalization framework, an important issue we have to
address is how to weigh the content preference and location
preference in the integration step. To address this issue, we propose to
adjust the weights of content preference and location preference based
on their effectiveness in the personalization process. For a given query
issued by a particular user, if the personalization based on preferences
from the content facet is more effective than based on the
preferences from the location facets, more weight should be put on the
content-based preferences; and vice versa. The notion of
personalization effectiveness is derived based on the diversity of the
content and location information in the search results as discussed in
USER-PREFERENCES-EXTRACTION-AND-PRIVACY-PRESERVATION
Given that the concepts and clickthrough data are collected from past
search activities, user’s preference can be learned. These search
preferences, inform of a set of feature vectors, are
to be submitted along with future queries to the PMSE server for
search result re-ranking. Instead of transmitting all the detailed
personal preference information to the server, PMSE allows the users
to control the amount of personal information exposed. In this section,
we first review a preference mining
algorithms, namely SpyNB Method, that we adopt in PMSE, and then
discuss how PMSE preserves user privacy.
PERSONALIZED-RANKING-FUNCTIONS
Upon reception of the user’s preferences, Ranking SVM (RSVM) [10] is
employed to learn a personalized ranking function for rank adaptation
of the search results according to the user content and location
preferences. For a given query, a set of content concepts and a set of
location concepts are extracted
from the search results as the document features. Since each document
can be represented by a feature vector, it can be treated as a point in
the feature space. Using the preference pairs as the input, RSVM aims
at finding a linear ranking function, which holds for as many document
preference pairs as possible. An adaptive implementation, SVMlight
available at [3], is used in our experiments. In the following, we discuss
CONCLUSION
To adapt to the user mobility, we incorporated the user’s GPS locations
in the personalization process. We observed that GPS locations help to
improve retrieval effectiveness, especially for location queries. We also
proposed two privacy parameters, minDistance and expRatio, to
address privacy issues in PMSE by allowing users to control the amount
of personal information exposed to the PMSE server. The privacy
parameters facilitate smooth control of privacy exposure while
maintaining good ranking quality. For future work, we will investigate
methods to exploit regular travel patterns and query patterns from the
GPS and clickthrough data to further enhance the personalization
effectiveness of PMSE