10-05-2013, 02:14 PM
Emerging Measures in Preserving Privacy For Publishing The Data
Emerging Measures.doc (Size: 265 KB / Downloads: 24)
ABSTRACT:
Search engines return roughly the same results for the same query, regardless of the user’s real interest. Personalized search is an important research area that aims to resolve the ambiguity of query terms. To increase the relevance of search results, personalized search engines create user profiles to capture the users’ personal preferences and as such identify the actual goal of the input query. Since users are usually reluctant to explicitly provide their preferences due to the extra manual effort involved, recent research has focused on the automatic learning of user preferences from users’ search histories or browsed documents and the development of personalized systems based on the learned user preferences. In this project, we focus on search engine personalization and develop several concept-based user profiling methods that are based on both positive and negative preferences. User profiles which capture both the user’s positive and negative preferences. Negative preferences improve the separation of similar and dissimilar queries, which facilitates an agglomerative clustering algorithm to decide if the optimal clusters have been obtained.
INTRODUCTION :
Data mining is often defined as finding hidden information in a database. Data mining is classified into two types predictive and descriptive. predictive model makes a prediction about values of data using known results found from different data. A descriptive model identifies patterns or relationships in data clustering comes under the category of descriptive. Clustering is classified into hierarchical, partitional, categorical, large database. A hierarchical algorithm creates a set of clusters. Hierarchical algorithms are classified into two types, agglomerative algorithm and divisive algorithm. In this agglomerative clustering algorithm concept is used to cluster the similar query and similar concepts to obtain the optimal results of clusters.
BACKGROUND:
A major problem of current Web search is that search queries are usually short and ambiguous, and thus are insufficient for specifying the precise user needs. To alleviate this problem, some search engines suggest terms that are semantically related to the submitted queries so that users can choose from the suggestions the ones that reflect their information needs. In this paper, we introduce an effective approach that captures the user’s conceptual preferences in order to provide personalized query suggestions. We achieve this goal with two new strategies. First, we develop online techniques that extract concepts from the web-snippets of the search result returned from a query and use the concepts to identify related queries for that query. Second, we propose a new two phase personalized agglomerative clustering algorithm that is able to generate personalized query clusters.
ALGORITHM FOR PERSONALIZED AGGLOMERATIVE CLUSTERING:
The personalized clustering algorithm iteratively merges the most similar pair of query nodes, and then, the most similar pair of concept nodes, and then, merge the most similar pair of query nodes, and so on. The following cosine similarity function is employed to compute the similarity score sim(x,y) of a pair of query nodes or a pair of concept nodes.
Termination point:
A common requirement of iterative clustering algorithms is to determine when the clustering process should stop to avoid over merging of the clusters. When the termination point for initial clustering is reached, community merging kicks off; when the termination point for community merging is reached, the whole algorithm terminates. Good timing to stop the two phases is important to the algorithm, since if initial clustering is stopped too early (i.e., not all clusters are well formed), community merging merges all the identical queries from different users, and thus, generates a single big cluster without much personalization effect. If initial clustering is stopped too late, the clusters are already overly merged before community merging begins. The low precision rate thus resulted would undermine the quality of the whole clustering process.
CONCLUSION AND FUTURE WORK:
An accurate user profile can greatly improve a search engine’s performance by identifying the information needs for individual users. we proposed and evaluated several user profiling strategies. The techniques make use of click through data to extract from Web-snippets to build concept-based user profiles automatically. We applied preference mining rules to infer not only user’s positive preferences but also their negative preferences and utilized both kinds of preferences in deriving user’s profiles. The user profiling strategies were evaluated and compared with the personalized query clustering method that we proposed previously.
Apart from improving the quality of the resulting clusters, the negative preferences in the proposed user profiles also help to separate similar and dissimilar queries into distant clusters, which helps to determine near optimal terminating points for our clustering algorithm.
We observe that the algorithmic optimal points for initial clustering and community merging usually are only one step away from the manually determined optimal points. Further, the precision and recall values obtained at the algorithmic optimal points are only slightly lower than those obtained at the manually determined optimal points.