22-04-2014, 04:47 PM
SCALABLE LEARNING OF COLLECTIVE BEHAVIOUR
[attachment=61971]
Abstract:
This study of collective behavior is to understand how individuals behave in a social
networking environment. Oceans of data generated by social media like Face book,
Twitter, Flicker, and YouTube present opportunities and challenges to study collective
behavior on a large scale. In this work, we aim to learn to predict collective behavior in
social media. In particular, given information about some individuals, how can we infer
the behavior of unobserved individuals in the same network? A social-dimension-based
approach has been shown effective in addressing the heterogeneity of connections
presented in social media. However, the networks in social media are normally of
colossal size, involving hundreds of thousands of actors. The scale of these networks
entails scalable learning of models for collective behavior prediction. To address the
scalability issue, we propose an edge-centric clustering scheme to extract sparse social
dimensions. With sparse social dimensions, the proposed approach can efficiently handle
networks of millions of actors while demonstrating a comparable prediction performance
to other non-scalable methods.
Existing System:
As existing approaches to extract social dimensions suffer from scalability, it is
imperative to address the scalability issue. Connections in social media are not
homogeneous. People can connect to their family, colleagues, college classmates, or
buddies met online. Some relations are helpful in determining a targeted behavior while
others are not. This relation-type information, however, is often not readily available in
social media. A direct application of collective inference or label propagation would treat
connections in a social network as if they were homogeneous.
Proposed System:
A recent framework based on social dimensions is shown to be effective in addressing
this heterogeneity. The framework suggests a novel way of network classification: first,
capture the latent affiliations of actors by extracting social dimensions based on network
connectivity, and next, apply extant data mining techniques to classification based on the
extracted dimensions.
In the initial study, modularity maximization was employed to extract social dimensions.
The superiority of this framework over other representative relational learning methods
has been verified with social media data in. The original framework, however, is not
scalable to handle networks of colossal sizes because the extracted social dimensions are
rather dense. In social media, a network of millions of actors is very common. With a
huge number of actors, extracted dense social dimensions cannot even be held in
memory, causing a serious computational problem.
Social dimension extraction:
The latent social dimensions are extracted based on network topology to capture the
potential affiliations of actors. These extracted social dimensions represent how each
actor is involved in diverse affiliations. These social dimensions can be treated as features
of actors for subsequent discriminative learning. Since a network is converted into
features, typical classifiers such as support vector machine and logistic regression can be
employed. Social dimensions extracted according to soft clustering, such as modularity
maximization and probabilistic methods, are dense.
Discriminative learning:
The discriminative learning procedure will determine which social dimension correlates
with the targeted behavior and then assign proper weights. A key observation is that
actors of the same affiliation tend to connect with each other. For instance, it is
reasonable to expect people of the same department to interact with each other more
frequently. A key observation is that actors of the same affiliation tend to connect with
each other. For instance, it is reasonable to expect people of the same department to
interact with each other more frequently. Hence, to infer actors’ latent affiliations, we
need to find out a group of people who interact with each other more frequently than at
random.