08-09-2014, 10:42 AM
Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development
Tweet Analysis.pdf (Size: 1.3 MB / Downloads: 41)
Abstract
Twitter has received much attention recently. An important characteristic of Twitter is its real-time nature. We investigate
the real-time interaction of events such as earthquakes in Twitter and propose an algorithm to monitor tweets and to detect a target
event. To detect a target event, we devise a classifier of tweets based on features such as the keywords in a tweet, the number of
words, and their context. Subsequently, we produce a probabilistic spatiotemporal model for the target event that can find the center of
the event location. We regard each Twitter user as a sensor and apply particle filtering, which are widely used for location estimation.
The particle filter works better than other comparable methods for estimating the locations of target events. As an application, we
develop an earthquake reporting system for use in Japan. Because of the numerous earthquakes and the large number of Twitter
users throughout the country, we can detect an earthquake with high probability (93 percent of earthquakes of Japan Meteorological
Agency (JMA) seismic intensity scale 3 or more are detected) merely by monitoring tweets. Our system detects earthquakes promptly
and notification is delivered much faster than JMA broadcast announcements.
INTRODUCTION
TWITTER, a popular microblogging service, has received
much attention recently. This online social network is
used by millions of people around the world to remain
socially connected to their friends, family members, and
coworkers through their computers and mobile phones [1].
Twitter asks one question, “What’s happening?” Answers
must be fewer than 140 characters. A status update
message, called a tweet, is often used as a message to
friends and colleagues. A user can follow other users; that
user’s followers can read her tweets on a regular basis. A
user who is being followed by another user need not
necessarily reciprocate by following them back, which
renders the links of the network as directed. Since its
launch on July 2006, Twitter users have increased rapidly.
The number of registered Twitter users exceeded 100 million
in April 2010. The service is still adding about 300,000 users
per day.1 Currently, 190 million users use Twitter per
month, generating 65 million tweets per day.2
EVENT DETECTION
As described in this paper, we target event detection. An
event is an arbitrary classification of a space-time region. An
event might have actively participating agents, passive
factors, products, and a location in space/time [13]. We
target events such as earthquakes, typhoons, and traffic
jams, which are readily apparent upon examination of
tweets. These events have several properties
Semantic Analysis of Tweets
To detect a target event from Twitter, we search from
Twitter and find useful tweets. Our method of acquiring
useful tweets for target event detection is portrayed in Fig. 3.
Tweets might include mention of the target event. For
example, users might make tweets such as “Earthquake!” or
“Now it is shaking.” Consequently, earthquake or shaking
might be keywords (which we call query words). However,
users might also make tweets such as “I am attending an
Earthquake Conference.” or “Someone is shaking hands
with my boss.” Moreover, even if a tweet is referring to the
target event, it might not be appropriate as an event report.
For instance, a user makes tweets such as “The earthquake
yesterday was scary.” or “Three earthquakes in four days.
Japan scares me.”
Tweet as a Sensory Value
We can search the tweet and classify it into a positive class if
a user makes a tweet about a target event. In other words,
the user functions as a sensor of the event. If she makes a
tweet about an earthquake occurrence, then it can be
considered that she, as an “earthquake sensor,” returns a
positive value. A tweet can therefore be regarded as a sensor
reading. This crucial assumption enables application of
various methods related to sensory information.
Information Diffusion Related to a Real-Time Event
Some information related to an event diffuses through
Twitter. For example, if a user detects an earthquake and
makes a tweet about the earthquake, then a follower of that
user might make tweets about that. This characteristic is
important because, in our model, sensors might not be
mutually independent, which would have an undesired
effect in terms of event detection.
Figs. 6, 7, and 8, respectively, portray the information
flow networks for an earthquake, a typhoon, and a new
Nintendo DS game.
EXPERIMENTS AND EVALUATION
In this section, we describe the experimentally obtained
results and evaluation of tweet classification and location
estimation.
The whole algorithm is the following:
1. Given a set of queries Q for a target event.
2. Put a query Q using search API every s seconds and
obtain tweets T.
3. For each tweet t 2 T, obtain features A, B, and C.
Apply the classification to obtain value vt ¼ f0; 1g.
4. If the enough number of tweets comes(poccur in
(1) exceeds 0.99 under the condition: 10 tweets in
10 minutes; ¼ 0:34; pf ¼ 0:35 then proceed to
step 5.
5. For each tweet t 2 T, we obtain the latitude and the
longitude lt by 1) using the associated GPS location,
2) making a query to Google Map for the registered
location for user ut. Set lt ¼ null if neither functions.
6. Calculate the estimated location of the event from
lt; t 2 T using normal particle filtering, particle
filtering with assigned weights, and particle filtering
with weights and sampling.
7. Send alert e-mails to registered users.
Evaluation of Spatial Estimation
Fig. 10 presents the location estimation of an earthquake
that occurred on August 11. Many tweets originated from
over a wide region in Japan. The estimated location of the
earthquake (shown as estimation by weighed particle filter)
is close to the actual epicenter of the earthquake, which
shows the efficiency of the location estimation algorithm.
Table 3 presents results of location estimation based on a
total of 621 tweets for 25 earthquakes that occurred during
August-October 2009. We compare results obtained using
three particle filtering methods with the weighted average
and the median as a baseline. The weighted average simply
takes the average of latitudes and longitude on all the
positive tweets; median simply takes their median. Particle
filters of three kinds perform well compared to other
baseline methods.
Proposed System
The proposed system, called Toretter,12 has been operated
since August 8, 2010. A system screenshot is depicted in
Fig. 14. Users can see the detection of past earthquakes.
They can register their e-mails to receive notices of future
earthquake detection reports.
It alerts users and urges them to prepare for the
imminent earthquake. It is hoped that a user receives the
e-mail before the earthquake actually affects that area.
We evaluate various conditions under which alarmsmight
be sent to choose better parameters for our proposed system.
We set alarm conditions as Ntweet positive tweets comes in
10 minute.We evaluate thosemethods byP recision ¼ Nearthquake
Nalarms
and Recall ¼ Nearthquake
Allearthquake (Nearthquake: Number of earthquakes
detected correctly, Nalarms: number of alarms, Allearthquake:
number of all earthquakes that occurred).
CONCLUSION
As described in this paper, we investigated the real-time
nature of Twitter, devoting particular attention to event
detection. Semantic analyses were applied to tweets to
classify them into a positive and a negative class. We
regard each Twitter user as a sensor, and set the problem
as detection of an event based on sensory observations.
Location estimation methods