26-10-2016, 10:30 AM
1461228847-project2.docx (Size: 318.47 KB / Downloads: 3)
ABSTRACT:
Given the wide usage of various web services, there has been a rising need for a more accurate and an efficient recommendation system. The net usage of these web services has been contributed predominantly by various social networking sites which aim on building a platform to improve social relations among people who share similarities. These networking sites were initiallyaided with profile based friend recommendation systems and were further improved to include life style based activities. This was to improve the precision of suggesting a friend to the user based on ranking the similarities acquired from their daily activities rather than just matching profiles. We have updated this system by including features such as infograph construction and also automated blocking of previously blocked contacts.
1. INTRODUCTION
One challenge with existing social networking services is a way to advocate an honest friend to a user. Most of themsuppose pre-existing user relationships to choose friend candidates. For instance, Facebook depends on a social link analysis among people who already share common friends and recommends symmetrical users as potential friends. Sadly, this approach might not be the foremost applicable recent social science findings.
The rules to cluster individuals along include: 1) habits or life style; 2) attitudes; 3) tastes; 4) ethical standards; 5) economic level; and 6) individuals they already understand. Apparently, rule #3 and rule #6 area unit the thought -factors thought about by existing recommendation systems. Rule #1, though most likely the foremost intuitive, isn'twide used as a result of users’ life designs area unit tough, if not not possible, to capture through internet actions. Rather, life styles are usually closely associated with daily routines and activities. Therefore, if we have a tendency to gather info on users’ daily routines and activities, we will exploit rule #1 and suggest friends to individualssupported their similar life designs.
Thus this disadvantage was overcome by incorporating life style based matching in addition to profile based friend recommendation. This new feature serves as a filter to recommend friends with whom we share much more similarities than just the profile and hence gives a more definitive reason to trust the suggestions made. Life style is based on various aspects in our everyday
routine like places we visit, events we participate in, communities and groups we support and so on. Hence comparison based on these aspects maybe more efficient in suggesting a friend whom we can actually have useful relation with rather than strangers with common friends.
The suggested system hence requires collecting life styles of various users which are then stored in a cloud for future references. These data are collected using Latent Dirichlet Allocation. We can describe latent Dirichlet allocation (LDA) as a generative probabilistic model for collections of discrete data. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. The goal is to find short descriptions of the members of a collection that enable efficientprocessing of large collections while preserving the essential statistical relationships that are useful for basic tasks such as classification, novelty detection, summarization, and similarity and relevancejudgments.
Latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus. The basic idea is that documents are represented as random mixtures over latent topics, where each topic is characterized by a distribution over words.
LDA assumes the following generative process for each document w in a corpus D:
1. Choose N _ Poisson(x).
2. Choose q _ Dir(a).
3. For each of the N words wn:
(a) Choose a topic zn _ Multinomial(q).
(b) Choose a word wnfrom p(wn j zn;b), a multinomial probability conditioned on the topic zn.
A k-dimensional Dirichlet random variable q can take values in the (k−1)-simplex (a k-vector
q lies in the (k−1)-simplex if qi _ 0, åki=1 qi = 1), and has the following probability density on this
4.1 k-MEANS CLUSTERING ALGORITHM
For a given set of data points, clustering is the process of grouping a set of objects or data points in such a way that the objects or data points in one group are similar to each other than to those in other groups. Clustering algorithm is mainly used in data mining in many fields such as machine learning, pattern recognition, image analysis, etc.
Clustering can be formulated as a multi-objective optimization problem. It can be achieved by several different methods or algorithms. Some of the basic clustering models are connectivity model, centroid model, distribution model, density model, graph model, etc.