25-08-2017, 09:32 PM
Organizing User Search Histories
Organizing User Search.doc (Size: 1.88 MB / Downloads: 25)
Abstract
Users are increasingly pursuing complex task-oriented goals on the Web, such as making travel arrangements, managing finances or planning purchases. To this end, they usually break down the tasks into a few co-dependent steps and issue multiple queries around these steps repeatedly over long periods of time. To better support users in their long-term information quests on the Web, search engines keep track of their queries and clicks while searching online. In this paper, we study the problem of organizing a user’s historical queries into groups in a dynamic and automated fashion. Automatically identifying query groups is helpful for a number of different search engine components and applications, such as query suggestions, result ranking, query al- terations, sessionization, and collaborative search. In our approach, we go beyond approaches that rely on textual similarity or time thresholds, and we propose a more robust approach that leverages search query logs. We experimentally study the performance of different techniques, and showcase their potential, especially when combined together.
INTRODUCTION
S the size and richness of information on the Web grows, so does the variety and the complexity of
tasks that users try to accomplish online. Users are no longer content with issuing simple navigational queries. Various studies on query logs (e.g., Yahoo’s [1] and AltaVista’s [2]) reveal that only about 20% of queries are navigational. The rest are informational or transac- tional in nature. This is because users now pursue much broader informational and task-oriented goals such as arranging for future travel, managing their finances, or planning their purchase decisions. However, the primary means of accessing information online is still through keyword queries to a search engine. A complex task such as travel arrangement has to be broken down into a number of co-dependent steps over a period of time. For instance, a user may first search on possible destinations, timeline, events, etc.
Query (or Query Group) Relevance.
To ensure that each query group contains closely related and relevant queries and clicks, it is important to have a suitable rele- vance measure sim between the current query singleton
group sc and an existing query group si ∈ S. There
are a number of possible approaches to determine the
relevance between sc and si . Below, we outline a number
of different relevance metrics that we will later use as
baselines in experiments (see Section 5). We will also
discuss the pros and cons of such metrics as well as our
proposed approach of using search logs (see Section 3).
Time. One may assume that sc and si are somehow relevant if the queries appear close to each other in time in the user ’s history. In other words, we assume that users generally issue very similar queries and clicks within a short period of time. In this case, we define the following time-based relevance metric simtime that can be used in place of sim in Figure 3.
QUERY RELEVANCE USING SEARCH LOGS
We now develop the machinery to define the query relevance based on Web search logs. Our measure of relevance is aimed at capturing two important properties of relevant queries, namely: (1) queries that frequently appear together as reformulations and (2) queries that have induced the users to click on similar sets of pages. We start our discussion by introducing three search behavior graphs that capture the aforementioned prop- erties. Following that, we show how we can use these graphs to compute query relevance and how we can incorporate the clicks following a user ’s query in order to enhance our relevance metric.
Search Behavior Graphs
We derive three types of graphs from the search logs of a commercial search engine. The query reformulation
graph, QRG, represents the relationship between a pair
of queries that are likely reformulations of each other. The query click graph, QCG, represents the relationship
between two queries that frequently lead to clicks on similar URLs. The query fusion graph, QF G, merges the
information in the previous two graphs. All three graphs are defined over the same set of vertices VQ, consisting
of queries which appear in at least one of the graphs, but their edges are defined differently.
Query Click Graph
A different way to capture relevant queries from the search logs is to consider queries that are likely to induce users to click frequently on the same set of URLs. For example, although the queries “ipod” and “apple store” do not share any text or appear temporally close in a user ’s search history, they are relevant because they are likely to have resulted in clicks about the ipod product. In order to capture such property of relevant queries, we
construct a graph called the query click graph, QCG.
One way to identify relevant queries is to consider query reformulations that are typically found within the query logs of a search engine. If two queries that are issued consecutively by many users occur frequently enough, they are likely to be reformulations of each other. To measure the relevance between two queries issued by a user, the time-based metric, simtime, makes use of the interval between the timestamps of the queries within the user ’s search history. In contrast, our approach is defined by the statistical frequency with which two queries appear next to each other in the entire query log, over all of the users of the system.
QUERY GROUPING USING THE QF G
In this section, we outline our proposed similarity func- tion simrel to be used in the online query grouping process outlined in Section 2. For each query, we main- tain a query image, which represents the relevance of other queries to this query. For each query group, we maintain a context vector, which aggregates the images of its member queries to form an overall representation. We then propose a similarity function simrel for two query groups based on these concepts of context vectors and query images. Note that our proposed definitions of query reformulation graph, query images, and context vectors are crucial ingredients, which lend significant novelty to the Markov chain process for determining relevance between queries and query groups.
Using Search Logs
As discussed in Section 3, our query grouping algorithm relies heavily on the use of search logs in two ways: first, to construct the query fusion graph used in com- puting query relevance, and, second, to expand the set of queries considered when computing query relevance. We start our experimental evaluation, by investigating how we can make the most out of the search logs.