21-01-2013, 03:26 PM
Social Network
Social Network.pdf (Size: 331.64 KB / Downloads: 26)
INTRODUCTION
The current phase on the internet is witnessing a tremendous growth of social networks and
huge amounts of new data are being created every second. With the advent of social networks, it
has also become possible to disseminate this information at very fast rates. Millions of new user
posts everyday are being created on social networking sites like Facebook, Twitter, Wordpress
and Flickr. In this section, we present a brief introduction about social networks with a special
focus on twitter. Twitter is not only a fantastic real-time social networking tool; it also acts as
a great source of rich information for data mining. On an average, the users on twitter produce
more than 180 million tweets per day (May 2012). This section introduces concepts of social me-
dia followed by specic twitter and nally presents a brief overview of the past researches in this
eld
The assumptions that have been made to distinguish between celebrities, broadcasters and
miscreants have been based on the analysis of the le provided and no algorithm has been used
for the same.
Social Media
Social Media has recently evolved into a source of social, political and real time information. In
addition to this it is also a great means of communication and marketing. People have been shar-
ing information on social networks through the use of status updates, blogging, sharing multi-
media content like images and videos as well as interacting together thereby forming groups and
communities on social networks. Monitoring and analyzing this information can lead to valuable
insights that might otherwise be hard to get using conventional methods and media sources. The
social networking sites such as Facebook, Twitter and Flickr provide a new way to share the in-
formation among them and get frequent updates. In addition to this, the sites also allow sharing
of additional information which can be important in analyzing the contents, e.g. location etc.
The social media has an advantage over conventional media sources as it is managed by the
users. Conventional media only allowed users to gain information that was provided to them.
The
ow of information was only one-sided from the media to user. With social networks,
however, the users now have the ability to respond to the news and events around them and
provide their opinion on them as well as share them. This leads to the evolution of a multi-
way mode of information dissemination in which the users post information along with other
information like links, images and videos.
Twitter launched as a micro-blogging website in March 2006 which allows users to post status
updates of up to 140 characters, also known popularly as tweets. According to ComScore,
within eight months of its launch, Twitter had about 94,000 users as of April, 2007. Since its
launch, twitter has amassed a large user base and now has over 350 million users (June, 2012).
Twitter allows its users to post short status messages called tweets. Tweets can be posted(tweeted)
from various sources which include the twitter website, twitter mobile applications as well as
several third party applications/websites (after authentication). Users also have the control
over the privacy features and they can choose to either make their tweets public which make
the tweets visible to any one or make them private which restricts the access to only some
users who obtain permission from the user. Users can follow other users on twitter which
gives them access to their tweets on their homepage on twitter.Twitter allows several other
features. It allows users to reply to tweets of other users by clicking on the reply button on
the tweet of the user who one wants to reply to. This is way to say something back in re-
sponse to a user's tweet. In addition to this, users can also mention other users in their tweets
by adding `@' to the username of another user in a tweet. A mention is a way to refer to
some other user. Another popular concept of twitter is retweeting. A retweet is an event of
sharing someone else's tweet to our followers. Retweet plays an important part in the dis-
semination of information on twitter. Users can also add a hashtag in their tweets by adding
a `#' sign before relevant keywords.
A Few Chirps About Twitter
This paper was published by Balachander Krishnamurthy from AT & T Lab, Phillipa Gill
from University of Calgary and Martin Arlitt from HP Labs/University of Calgary. In this
paper they have given idea about two dierent methodologies that they have used for gather-
ing dataset from twitter. They identied distinct classes of Twitter users and their behaviors,
geographic growth patterns and current size of the network, and compared crawl results ob-
tained under rate limiting constraints.A key distinguishing factor of these Social networks is
that they provide a new means of communication. In the case of Twitter it is Short Message
Service (SMS), a store and forward best eort delivery system for text messages. In the case
of qik, it is streaming video from cell phones. while Dodgeball lets users update their status
along with ne-grained geographical information, allowing the system to locate friends nearby.
A distinguishing factor of such social networks and applications is their ability to deliver the
data to interested users over multiple delivery channels. For example, Twitter messages can be
received by users as a text message on their cell phone, through a Facebook application that
users have added to their Facebook account to see the messages when they log in, via email,
as an RSS feed, or as an Instant Message (with a choice of Jabber, GoogleTalk etc.).
DATA COLLECTION
There are various techniques to collect data from twitter website.A web crawler is a computer
program that browses theWorld WideWeb in a methodical, automated manner or in an orderly
fashion.Breadth-rst crawl using twitter api captures pages with high page rank early in the
crawl.Each API represents a facet of Twitter, and allows developers to build upon and extend
their applications in new and creative ways.
RELATED WORK:
The crawling part of the project was supposed to use breadth rst search and snow ball
sampling.The rst part of the project involved developing a code to crawl the Twitter so-
cial network in order to gather the data the vast amount of data contained in its databases.
Despite several attempts, we failed to create a connection to the Internet using the System
proxy settings. The main reason for this was our inability to access these settings despite
making several references with the guide and the Internet. In addition, the code couldn't be
developed in the time frame that had been allotted for it, in order to give us a fair share of
time for crawling the social network.
DATA MINING
The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets is
called data mining. It is the technology to enable data exploration, data analysis and Data visu-
alization of very large databases at a high level of abstraction.
DataWarehousing: A data warehouse is a relational database that is designed for query and
analysis rather than for transaction processing. It usually contains historical data derived from
transaction data, but it can include data from other sources. It separates analysis workload
from transaction workload and enables an organization to consolidate data from several sources.