objective of project scalable learning of collective behavior

[email protected] · 31-08-2012, 05:39 PM

plz tell me all details about this topic and how to implement it
i want full guidance

**seminar flower** · 11-10-2012, 03:52 PM

Scalable Learning of Collective Behavior

.pdf

Scalable Learning.pdf (Size: 1.85 MB / Downloads: 42)

Abstract

This study of collective behavior is to understand how individuals behave in a social networking environment. Oceans of
data generated by social media like Facebook, Twitter, Flickr, and YouTube present opportunities and challenges to study collective
behavior on a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information
about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension-based
approach has been shown effective in addressing the heterogeneity of connections presented in social media. However, the networks
in social media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails scalable
learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to
extract sparse social dimensions. With sparse social dimensions, the proposed approach can efficiently handle networks of millions of
actors while demonstrating a comparable prediction performance to other nonscalable methods.

INTRODUCTION

THE advancement in computing and communication
technologies enables people to get together and share
information in innovative ways. Social networking sites (a
recent phenomenon) empower people of different ages and
backgrounds with new forms of collaboration, communication,
and collective intelligence. Prodigious numbers of
online volunteers collaboratively write encyclopedia articles
of unprecedented scope and scale; online marketplaces
recommend products by investigating user shopping
behavior and interactions; and political movements also
exploit new forms of engagement and collective action. In
the same process, social media provides ample opportunities
to study human interactions and collective behavior
on an unprecedented scale. In this work, we study how
networks in social media can help predict some human
behaviors and individual preferences. In particular, given
the behavior of some individuals in a network, how can we
infer the behavior of other individuals in the same social
network [1]? This study can help better understand
behavioral patterns of users in social media for applications
like social advertising and recommendation.

SOCIAL DIMENSIONS

Connections in social media are not homogeneous. People
can connect to their family, colleagues, college classmates, or
buddies met online. Some relations are helpful in determining
a targeted behavior (category) while others are not. This
relation-type information, however, is often not readily
available in social media. A direct application of collective
inference [9] or label propagation [12] would treat connections
in a social network as if they were homogeneous. To
address the heterogeneity present in connections, a framework
(SocioDim) [2] has been proposed for collective
behavior learning.

SPARSE SOCIAL DIMENSIONS

In this section, we first show one toy example to illustrate the
intuition of communities in an “edge” view and then present
potential solutions to extract sparse social dimensions.

Communities in an Edge-Centric View

Though SocioDim with soft clustering for social dimension
extraction demonstrated promising results, its scalability is
limited. A network may be sparse (i.e., the density of
connectivity is very low), whereas the extracted social
dimensions are not sparse. Let’s look at the toy network
with two communities in Fig. 1. Its social dimensions
following modularity maximization are shown in Table 2.
Clearly, none of the entries is zero. When a network expands
into millions of actors, a reasonably large number of social
dimensions need to be extracted. The corresponding
memory requirement hinders both the extraction of social
dimensions and the subsequent discriminative learning.
Hence, it is imperative to develop some other approach so
that the extracted social dimensions are sparse.

Edge Partition via Line Graph Partition

In order to partition edges into disjoint sets, one way is to
look at the “dual” view of a network, i.e., the line graph
[15]. We will show that this is not a practical solution. In a
line graph L(G), each node corresponds to an edge in the
original network G, and edges in the line graph represent
the adjacency between two edges in the original graph. The
line graph of the toy example is shown in Fig. 4. For
instance, eð1; 3Þ and eð2; 3Þ are connected in the line graph
as they share one terminal node 3.
Given a network, graph partition algorithms can be
applied to its corresponding line graph. The set of communities
in the line graph corresponds to a disjoint edge
partition in the original graph. Recently, such a scheme has
been used to detect overlapping communities [16], [17]. It is,
however, prohibitive to construct a line graph for a megascale
network.

EXPERIMENT SETUP

In this section, we present the data collected from social
media for evaluation and the baseline methods for
comparison.

Social Media Data

Two data sets reported in [2] are used to examine our
proposed model for collective behavior learning. The first
data set is acquired from BlogCatalog,3 the second from a
popular photo sharing site Flickr.4 Concerning behavior,
following [2], we study whether or not a user joins a group
of interest. Since the BlogCatalog data do not have this
group information, we use blogger interests as the behavior
labels. Both data sets are publicly available at the first
author’s homepage. To examine scalability, we also include
a mega-scale network5 crawled from YouTube.6 We remove
those nodes without connections and select the interest
groups with 500 þ subscribers. Some statistics of the three
data sets can be found in Table 4.
5.2 Baseline Methods
As we discussed in Section 4.2, constructing a line graph is
prohibitive for large-scale networks. Hence, the line-graph
approach is not included for comparison. Alternatively, the
edge-centric clustering (or EdgeCluster) in Section 4.2 is
used to extract social dimensions on all data sets. We adopt
cosine similarity while performing the clustering. Based on
cross validation, the dimensionality is set to 5,000, 10,000,
and 1,000 for BlogCatalog, Flickr, and YouTube, respectively.
A linear SVM classifier [18] is then exploited for
discriminative learning.

Regularization Effect

In this section, we study how EdgeCluster performance
varies with different regularization schemes. Three different
regularizations are implemented: regularization based on
community size, node affiliation, and both (we first apply
community size regularization, then node affiliation). The
results without any regularization are also included as a
baseline for comparison. In our experiments, the community
size regularization penalty function is set to (log(jCijÞÞ2, and
node affiliation regularization normalizes the summation of
each node’s community membership to 1. As shown in Fig. 8,
regularization on both community size and node affiliation
consistently boosts the performance on Flickr data. We also
observe that node affiliation regularization significantly
improves performance. It seems like community-size regularization
is not as important as the node-affiliation
regularization. Similar trends are observed in the other two
data sets.Wemust point out that, when both community-size
regularization and node affiliation are applied, it is indeed
quite similar to the tf-idf weighting scheme for representation
of documents [24]. Such a weighting scheme actually can lead
to a quite different performance in dealing with social
dimensions extracted from a network.

RELATED WORK

Classification with networked instances are known as withinnetwork
classification [9], or a special case of relational learning
[11]. The data instances in a network are not independently
identically distributed (i.i.d.) as in conventional data mining.
To capture the correlation between labels of neighboring
data objects, typically a Markov dependency assumption is
assumed. That is, the label of one node depends on the labels
(or attributes) of its neighbors. Normally, a relational
classifier is constructed based on the relational features of
labeled data, and then an iterative process is required to
determine the class labels for the unlabeled data. The class
label or the class membership is updated for each node while
the labels of its neighbors are fixed. This process is repeated
until the label inconsistency between neighboring nodes is
minimized. It is shown that [9] a simple weighted vote
relational neighborhood classifier [25] works reasonably
well on some benchmark relational data and is recommended
as a baseline for comparison.
However, a network tends to present heterogeneous
relations, and the Markov assumption can only capture the
local dependency. Hence, researchers propose to model
network connections or class labels based on latent groups
[20], [26]. A similar idea is also adopted in [2] to differentiate
heterogeneous relations in a network by extracting social
dimensions to represent the potential affiliations of actors in
a network. The authors suggest using the community
membership of a soft clustering scheme as social dimensions.
The extracted social dimensions are treated as
features, and a support vector machine based on that can
be constructed for classification. It has been shown that the
proposed social dimension approach significantly outperforms
representative methods based on collective inference

CONCLUSIONS AND FUTURE WORK

It is well known that actors in a network demonstrate
correlated behaviors. In this work, we aim to predict the
outcome of collective behavior given a social network and
the behavioral information of some actors. In particular, we
explore scalable learning of collective behavior when
millions of actors are involved in the network. Our approach
follows a social-dimension-based learning framework. Social
dimensions are extracted to represent the potential affiliations
of actors before discriminative learning occurs. As
existing approaches to extract social dimensions suffer from
scalability, it is imperative to address the scalability issue.
We propose an edge-centric clustering scheme to extract
social dimensions and a scalable k-means variant to handle
edge clustering. Essentially, each edge is treated as one data
instance, and the connected nodes are the corresponding
features. Then, the proposed k-means clustering algorithm
can be applied to partition the edges into disjoint sets, with
each set representing one possible affiliation.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	beauty parlour management system project report	Guest	1	7,445	07-04-2020, 11:06 PM Last Post:
	lending tree project source code using java	Guest	6	5,817	23-02-2019, 12:31 PM Last Post: Revajonfed
	hotel management system project report	Guest	6	6,160	06-09-2018, 07:41 PM Last Post: Snehal shivaji pawar
	voice based email for blinds a project	Guest	6	13,353	14-10-2017, 11:29 AM Last Post: jaseela123
	major project topics in zoology	Guest	2	5,132	21-09-2017, 01:10 PM Last Post: jaseela123
	laboratory management system project report in vb	Guest	1	3,430	20-09-2017, 02:56 PM Last Post: jaseela123
	pacemaker project report	leilarc2	1	1,934	12-09-2017, 10:33 AM Last Post: jaseela123
	visa processing mini project in html	Guest	1	1,825	11-09-2017, 12:19 PM Last Post: jaseela123
	anna university project report format 2011	Guest	1	4,971	07-09-2017, 09:42 AM Last Post: jaseela123
	e fraud prevention based on the self authentication project abstract	Guest	1	1,392	02-09-2017, 03:51 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.