RANKING CONCEPT-BASED USER PROFILE FROM SEARCH ENGINE LOGS pdf

**study tips** · 21-05-2013, 11:19 AM

RANKING CONCEPT-BASED USER PROFILE FROM SEARCH ENGINE LOGS

.pdf

RANKING CONCEPT.pdf (Size: 351.35 KB / Downloads: 16)

ABSTRACT:

Commercial search engines return roughly the same results for the same query, regardless of the user’s real interest. Since queries submitted to search engines tend to be short and ambiguous, they are not likely to be able to express the user’s precise needs. In existing system, most existing user profiling strategies are based on objects that users are interested in (i.e., positive preferences), but not the objects that users dislike (i.e., negative preferences). Experimental results show that profiles which capture and utilize both of the user’s positive and negative preferences perform the best and also negative preferences can increase the separation between similar and dissimilar queries. The separation can be achieved by using agglomerative clustering algorithm to terminate and improve the overall quality of the resulting query clusters. In the proposing system, queries submitted to search engines, they are likely to be able to express the user’s precise needs and the concept based user profiles can be integrated into the ranking algorithms of a search engine so that search results can be ranked according to individual user’s interests. This technique improves a search engine’s performance by identifying the information needs for individual users.

INTRODUCTION

MOST commercial search engines return roughly the same results for the same query, regardless of the user’s real interest. Since queries submitted to search engines tend to be short and ambiguous, they are not likely to be able to express the user’s precise needs. For example, a farmer may use the query “apple” to find information about growing delicious apples, while graphic designers may use the same query to find information about Apple Computer. Personalized search is an important research area that aims to resolve the ambiguity of query terms. To increase the relevance of search results, personalized search engines create user profiles to capture the users’ personal preferences and as such identify the actual goal of the input query. Since users are usually reluctant to explicitly provide their preferences due to the extra manual effort involved, recent research has focused on the automatic learning of user preferences from users’ search histories or browsed documents and the development of personalized systems based on the learned user preferences.

PERSONALIZED CONCEPT-BASED QUERY CLUSTERING

Our personalized concept-based clustering method consists of three steps. First, concept extraction algorithm, extract concepts and their relations from the Web-snippets returned by the search engine. Second, seven different concept-based user profiling strategies, to create concept based user profiles. Finally, the concept-based user profiles are compared with each other and against as baseline our previously proposed personalized concept-based clustering algorithm.

Concept Extraction

After a query is submitted to a search engine, a list of Web snippets is returned to the user. We assume that if a keyword/phrase exists frequently in the Web-snippets of a particular query, it represents an important concept related to the query because it coexists in close proximity with the query in the top documents. Thus, we employ the following support formula, which is inspired by the well-known problem of finding frequent item sets in data mining [7], to measure the interestingness of a particular keyword/phrase extracted from the Web-snippets.

Query Clustering Algorithm

Concept-based clustering algorithm with which ambiguous queries can be classified into different query clusters. Concept-based user profiles are employed in the clustering process to achieve personalization effect. First, a query-concept bipartite graph G is constructed by the clustering algorithm in which one set of nodes corresponds to the set of users’ queries and the other corresponds to the sets of extracted concepts. Each individual query submitted by each user is treated as an individual node in the bipartite graph by labeling each query with a user identifier. Concepts with interestingness weights greater than zero in the user profile are linked to the query with the corresponding interestingness weight in G. Second, a two-step personalized clustering algorithm is applied to the bipartite graph G, to obtain clusters of similar queries and similar concepts.

EXPERIMENTAL RESULTS

We evaluate and analyze the seven conceptbased user profiling strategies (i.e., PClick, PJoachims_C, PmJoachims_C, PSpyNB_C, PClick+Joachims_C, PClick+mJoachims_C, and PClick+SpyNB_C). The seven concept-based user profiling strategies are compared using our personalized concept-based clustering algorithm [11]. The collected clickthrough data are used by the proposed user profiling strategies to create user profiles. The performance of a heuristic for determining the termination points of initial clustering and community merging based on the change of intracluster similarity. We show that user profiling methods that incorporate negative concept weights return termination points that are very close to the optimal points obtained by exhaustive search. A. Experimental Setup The query and click through data for evaluation are adopted from our previous work [11]. To evaluate the performance of our user profiling strategies, we developed a middleware for Google to collect click through data. We used 500 test queries, which are intentionally designed to have ambiguous. The clusters obtained from the algorithms are compared against the standard clusters to check for their correctness. The 100 users are invited to use our middleware to search for the answers of the 500 test queries (accessible at [3]). To avoid any bias, the test queries are randomly selected from 10 different categories.

CONCLUSION

An accurate user profile can greatly improve a search engine’s performance by identifying the information needs for individual users. In this paper, we proposed and evaluated several user profiling strategies. The techniques make use of click through data to extract from Web-snippets to build concept-based user profiles automatically. We applied preference mining rules to infer not only users’ positive preferences but also their negative preferences, and utilized both kinds of preferences in deriving user’s profiles. The user profiling strategies were evaluated and compared with the personalized query clustering method that we proposed previously. Our experimental results show that profiles capturing both of the user’s positive and negative preferences perform the best among the user profiling strategies studied. Apart from improving the quality of the resulting clusters, the negative preferences in the proposed user profiles also help to separate similar and dissimilar queries into distant clusters, which help to determine near optimal terminating points for our clustering algorithm.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Development of a workflow based Complaint Management System (where the complaints are	mechanical engineering crazy	2	28,844,331	26-11-2018, 12:11 PM Last Post: Guest
	RIA based E- Shopping Portal for Electronic Gadgets Report	study tips	1	1,588	21-09-2017, 01:25 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	Uisce: Characteristic-based Routing in Mobile Ad Hoc Networks	project uploader	1	1,721	14-09-2017, 03:30 PM Last Post: jaseela123
	DEVELOPMENT OF A GSM BASED VEHICLE MONITORING & SECURITY SYSTEM	seminar flower	1	1,547	14-09-2017, 10:15 AM Last Post: jaseela123
	Using Rapid Prototyping Data to Enhance a Knowledge-Based Framework for Product Redes	smart paper boy	1	115,120	13-09-2017, 09:54 AM Last Post: jaseela123
	Symmetric Key Cryptography Based Secure Routing In Wireless Sensor Networks	seminar projects maker	1	647	11-09-2017, 12:23 PM Last Post: jaseela123
	SMS Based Student Intimation Report	study tips	1	1,520	09-09-2017, 12:40 PM Last Post: jaseela123
	Web-Based Information System for Blood Donation	seminar ideas	1	2,554	09-09-2017, 10:24 AM Last Post: jaseela123
	A Wavelet based Statistical Method for De-Noising of Ocular Artifacts in EEG Signals	seminar class	1	274,041	09-09-2017, 09:23 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.