21-05-2013, 11:19 AM
RANKING CONCEPT-BASED USER PROFILE FROM SEARCH ENGINE LOGS
RANKING CONCEPT.pdf (Size: 351.35 KB / Downloads: 16)
ABSTRACT:
Commercial search engines return roughly the same results for the same query, regardless of the user’s real interest. Since queries submitted to search engines tend to be short and ambiguous, they are not likely to be able to express the user’s precise needs. In existing system, most existing user profiling strategies are based on objects that users are interested in (i.e., positive preferences), but not the objects that users dislike (i.e., negative preferences). Experimental results show that profiles which capture and utilize both of the user’s positive and negative preferences perform the best and also negative preferences can increase the separation between similar and dissimilar queries. The separation can be achieved by using agglomerative clustering algorithm to terminate and improve the overall quality of the resulting query clusters. In the proposing system, queries submitted to search engines, they are likely to be able to express the user’s precise needs and the concept based user profiles can be integrated into the ranking algorithms of a search engine so that search results can be ranked according to individual user’s interests. This technique improves a search engine’s performance by identifying the information needs for individual users.
INTRODUCTION
MOST commercial search engines return roughly the same results for the same query, regardless of the user’s real interest. Since queries submitted to search engines tend to be short and ambiguous, they are not likely to be able to express the user’s precise needs. For example, a farmer may use the query “apple” to find information about growing delicious apples, while graphic designers may use the same query to find information about Apple Computer. Personalized search is an important research area that aims to resolve the ambiguity of query terms. To increase the relevance of search results, personalized search engines create user profiles to capture the users’ personal preferences and as such identify the actual goal of the input query. Since users are usually reluctant to explicitly provide their preferences due to the extra manual effort involved, recent research has focused on the automatic learning of user preferences from users’ search histories or browsed documents and the development of personalized systems based on the learned user preferences.
PERSONALIZED CONCEPT-BASED QUERY CLUSTERING
Our personalized concept-based clustering method consists of three steps. First, concept extraction algorithm, extract concepts and their relations from the Web-snippets returned by the search engine. Second, seven different concept-based user profiling strategies, to create concept based user profiles. Finally, the concept-based user profiles are compared with each other and against as baseline our previously proposed personalized concept-based clustering algorithm.
Concept Extraction
After a query is submitted to a search engine, a list of Web snippets is returned to the user. We assume that if a keyword/phrase exists frequently in the Web-snippets of a particular query, it represents an important concept related to the query because it coexists in close proximity with the query in the top documents. Thus, we employ the following support formula, which is inspired by the well-known problem of finding frequent item sets in data mining [7], to measure the interestingness of a particular keyword/phrase extracted from the Web-snippets.
Query Clustering Algorithm
Concept-based clustering algorithm with which ambiguous queries can be classified into different query clusters. Concept-based user profiles are employed in the clustering process to achieve personalization effect. First, a query-concept bipartite graph G is constructed by the clustering algorithm in which one set of nodes corresponds to the set of users’ queries and the other corresponds to the sets of extracted concepts. Each individual query submitted by each user is treated as an individual node in the bipartite graph by labeling each query with a user identifier. Concepts with interestingness weights greater than zero in the user profile are linked to the query with the corresponding interestingness weight in G. Second, a two-step personalized clustering algorithm is applied to the bipartite graph G, to obtain clusters of similar queries and similar concepts.
EXPERIMENTAL RESULTS
We evaluate and analyze the seven conceptbased user profiling strategies (i.e., PClick, PJoachims_C, PmJoachims_C, PSpyNB_C, PClick+Joachims_C, PClick+mJoachims_C, and PClick+SpyNB_C). The seven concept-based user profiling strategies are compared using our personalized concept-based clustering algorithm [11]. The collected clickthrough data are used by the proposed user profiling strategies to create user profiles. The performance of a heuristic for determining the termination points of initial clustering and community merging based on the change of intracluster similarity. We show that user profiling methods that incorporate negative concept weights return termination points that are very close to the optimal points obtained by exhaustive search. A. Experimental Setup The query and click through data for evaluation are adopted from our previous work [11]. To evaluate the performance of our user profiling strategies, we developed a middleware for Google to collect click through data. We used 500 test queries, which are intentionally designed to have ambiguous. The clusters obtained from the algorithms are compared against the standard clusters to check for their correctness. The 100 users are invited to use our middleware to search for the answers of the 500 test queries (accessible at [3]). To avoid any bias, the test queries are randomly selected from 10 different categories.
CONCLUSION
An accurate user profile can greatly improve a search engine’s performance by identifying the information needs for individual users. In this paper, we proposed and evaluated several user profiling strategies. The techniques make use of click through data to extract from Web-snippets to build concept-based user profiles automatically. We applied preference mining rules to infer not only users’ positive preferences but also their negative preferences, and utilized both kinds of preferences in deriving user’s profiles. The user profiling strategies were evaluated and compared with the personalized query clustering method that we proposed previously. Our experimental results show that profiles capturing both of the user’s positive and negative preferences perform the best among the user profiling strategies studied. Apart from improving the quality of the resulting clusters, the negative preferences in the proposed user profiles also help to separate similar and dissimilar queries into distant clusters, which help to determine near optimal terminating points for our clustering algorithm.