Personalizing Search Based on User Search Histories pdf

**study tips** · 29-06-2013, 03:02 PM

Personalizing Search Based on User Search Histories

.pdf

Personalizing Search.pdf (Size: 280.84 KB / Downloads: 30)

Abstract

User proles, descriptions of user interests, can be used by search engines to
provide personalized search results. Many approaches to creating user proles
collect user information through proxy servers (to capture browsing histories)
or desktop bots (to capture activities on a personal computer). Both these techniques
require participation of the user to install the proxy server or the bot. In
this study, we explore the use of a less-invasive means of gathering user information
for personalized search. In particular, we build user proles based on
activity at the search site itself and study the use of these proles to provide
personalized search results. By implementing a wrapper around the Google
search engine, we were able to collect information about individual user search
activities. In particular, we collected the queries for which at least one search result
was examined, and the snippets (titles and summaries) for each examined
result.

Introduction

Motivation

Companies that provide marketing data report that search engines are used
more and more as referrals to web sites, rather than direct navigation via hyperlinks
[30]. As search engines perform a larger role in commercial applications,
the desire to increase their effectiveness grows. However, search engines
order their results based on the small amount of information available in the
user's queries and by web site popularity, rather than individual user interests.
Thus, all users see the same results for the same query, even if they have wildly
different interests and backgrounds. To address this issue, interest in personalized
search had grown in the last several years, and user prole construction is
an important component of any personalization system.

Overview

Our approach builds user proles based on the user's interactions with a particular
search engine. Among all search engines available, we decided to adopt
Google [13] for the following reasons:
² it maintains one of the biggest collections of web pages;
² it provides a special APIs (Google APIs [12] ) that allows users to write
programs that submits queries to Google using a web service based on
the SOAP protocol [35]. The results retrieved are returned in a structured
XML le that can be easily processed;
² it is very popular, so users feel comfortable using it via a new interface
rather than relying on a completely different search engine altogether.
For our system, we implemented GoogleWrapper: a wrapper around the Google
search engine [13] that logs the queries, search results, and clicks on a per user
basis. This information was then used to create user proles and these proles
were used in a controlled study to determine their effectiveness for providing
personalized search results. In order to capture unbiased data, Google's results
were randomized before presentation to the user.

Privacy

In general, in order to provide personalized search, the system needs some
information from which to build a prole whether it be allocated by the server
or by a client-side bot. A commercial server-side approach could store just the
prole rather than the raw data. However, since we need to run and evaluate a
variety of algorithms, we stored data for the duration of the experiment. This
raises several privacy issues. First, how securely was the data protected from
hacking and second, do users want to share their data at all.
To address the rst issue, users were identied using an alphanumeric ID
stored in a cookie. No data on personal identity was exchanged except during
the initial registration process. This information was stored separately in order
to reset a cookie in case it was lost. The log les were stored in a directory that
was not world accessible. The log with queries and snippets was separated
from the le maintaining the identities of users. The mapping between the two
les was created by means of IDs.

Personalization

Personalization is the process of presenting the right information to the right
user at the right moment. In order to learn about a user, systems must collect
personal information, analyze it, and store the results of the analysis in a user
prole. Information can be collected from users in two ways: explicitly, for
example asking for feedback such as preferences or ratings; or implicitly, for
example observing user behaviors such as the time spent reading an on-line
document.
Commercial systems tend to focus on personalized search using an explicitly
dened prole. In Google's beta version [14], for example, users are asked
to select the categories of topics which they are interested in and the search
engine applies this information during the retrieval process.

Ontologies and Semantic Web

For our study, based on previous research work from Trajkova and Gauch [18],
we decided to represent user proles as a hierarchy of weighted concepts that
are dened in a reference ontology. According to Gruber [15], an ontology is
a specication of a conceptualization. Ontologies can be dened in different
ways, but they all represent a taxonomy of concepts along with the relations
between them. In the context of theWorldWideWeb, ontologies are important
because they formally dene terms shared between any type of agents without
ambiguity, allowing information to be processed automatically and accurately.
OntoSeek [16] is an example of an information retrieval system based on
ontologies. The main assumption is that precision and recall would improve if
we used sense matching instead of word matching. The domains in which
the system operates are catalogues of either heterogeneous or homogeneous
products. The description of each product in the catalog is translated into a
lexical conceptual graph; i.e., a tree structure where nodes are nouns from the
description and arcs are concepts inferred by the corresponding nouns. All
graphs, one for each product, are stored in a repository. A special user interface
is provided to submit queries. When a query is issued, the user is required to
disambiguate its meaning.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Development of a workflow based Complaint Management System (where the complaints are	mechanical engineering crazy	2	28,844,331	26-11-2018, 12:11 PM Last Post: Guest
	RIA based E- Shopping Portal for Electronic Gadgets Report	study tips	1	1,588	21-09-2017, 01:25 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	Uisce: Characteristic-based Routing in Mobile Ad Hoc Networks	project uploader	1	1,721	14-09-2017, 03:30 PM Last Post: jaseela123
	DEVELOPMENT OF A GSM BASED VEHICLE MONITORING & SECURITY SYSTEM	seminar flower	1	1,547	14-09-2017, 10:15 AM Last Post: jaseela123
	Using Rapid Prototyping Data to Enhance a Knowledge-Based Framework for Product Redes	smart paper boy	1	115,120	13-09-2017, 09:54 AM Last Post: jaseela123
	Symmetric Key Cryptography Based Secure Routing In Wireless Sensor Networks	seminar projects maker	1	647	11-09-2017, 12:23 PM Last Post: jaseela123
	SMS Based Student Intimation Report	study tips	1	1,520	09-09-2017, 12:40 PM Last Post: jaseela123
	Web-Based Information System for Blood Donation	seminar ideas	1	2,554	09-09-2017, 10:24 AM Last Post: jaseela123
	A Wavelet based Statistical Method for De-Noising of Ocular Artifacts in EEG Signals	seminar class	1	274,041	09-09-2017, 09:23 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.