18-01-2013, 04:35 PM
Credibility Ranking of Tweets during High Impact Events
1Credibility Ranking.doc (Size: 33.5 KB / Downloads: 23)
ABSTRACT:
Twitter has evolved from being a conversation or opinion sharing medium among friends into a platform to share and disseminate information about current events. Events in the real world create a corresponding spur of posts (tweets) on Twitter. Not all content posted on Twitter is trustworthy or useful in providing information about the event. In this paper, we analyzed the credibility of information in tweets corresponding to fourteen high impact news events of 2011 around the globe. From the data we analyzed, on average 30% of total tweets posted about an event contained situational information about the event while 14% was spam. Only 17% of the total tweets posted about the event contained situational awareness information that was credible. Using regression analysis, we identified the important content and sourced based features, which can predict the credibility of information in a tweet. Prominent content based features were number of unique characters, swear words, pronouns, and emoticons in a tweet, and user based features like the number of followers and length of username. We adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets
according to their credibility score. The performance of our ranking algorithm significantly enhanced when we applied re-ranking strategy. Results show that extraction of credible information from Twitter can be automated with high confidence..
EXISTING SYSTEM:
So far, the work done to assess credibility on Twitter, have explored credibility with respect to trending topics and users. Our work differs from that done by Castillo et. Al. Their analysis was based on credibility of a trending topic (all tweets belonging to a topic were marked as credible or incredible) on Twitter, while we focus on assessing credibility at the level of tweets. This difference in approaches lends a significant impact in case of Twitter, since a topic (e.g. earthquake at a particular location) maybe credible, yet the tweets in that topic maybe of credible or incredible (e.g. Richter scale of the earthquake) in nature. Hence, credibility of a topic may not be a good indicator to judge the credibility of the content of the tweet. In this paper, we use automated ranking techniques to assess credibility at the most atomic level of information on Twitter, i.e. at a tweet level. Using supervised machine learning and relevance feedback approach, we show that ranking of tweets based on Twitter features (topic and source) can aid in assessing credibility of information in messages posted about an event. We believe, our results can help users in making a decision on the credibility of the tweet..
PROPOSED SYSTEM:
Presence of spam, compromised accounts, malware, and phishing attacks are major concerns with respect to the quality of information on Twitter. Techniques to filter out spam / phishing on Twitter has been studied and various effective solutions have been proposed .Truthy, was developed by Ratkiewicz et al. to study information diffusion on Twitter and compute a trustworthiness score for a public stream of micro-blogging updates related to an event to detect political smears, astroturfing, misinformation, and other forms of social pollution.
Module Description:
• Role of Twitter During News Events
• Quality of Information on Twitter
• Relevance Ranking in Web
• Content or message level features
Role of Twitter during News Events:
Computer science research community has analyzed relevance of online social media, and in particular Twitter, as news disseminating agent, in the past. Kwak et al. showed the prominence of Twitter as a news media, they showed that 85% topics discussed on Twitter are related to news. Their work highlighted the relationship between user specific parameters v/s the tweeting activity patterns, like analysis of the number of followers and followers v/s the tweeting (re-tweeting) numbers. Zhao et al. in their work, used unsupervised topic modeling to compare the news topic from Twitter versus New York Times (a traditional news dissemination medium). They showed that Twitter users are relatively less interested in world news; still they are active in spreading news of important world events.
Quality of Information on Twitter:
Presence of spam, compromised accounts, malware, and phishing attacks are major concerns with respect to the quality of information on Twitter. Techniques to filter out spam phishing on Twitter has been studied and various effective solutions have been proposed. Truthy, was developed by Ratkiewicz et al. to study information
diffusion on Twitter and compute a trustworthiness score for a public stream of micro-blogging updates related to an event to detect political smears, astroturfing, misinformation, and other forms of social pollution . In their work, they presented certain cases of abusive behavior by Twitter users. Castillo et al. showed that automated classification techniques can be used to detect news topics from conversational topics and assessed their credibility based on various Twitter features. The achieved a precision and recall of 70-80% using J48 decision tree classification algorithms. Canini et al. analyzed usage of automated ranking trategies to measure credibility of sources of information on Twitter for any given topic. They observed that content and network structure act as prominent features for effective credibility based ranking of users of Twitter. Gupta et al. in their work on analyzing tweets posted during the terrorist bomb blasts in Mumbai (India, 2011), showed that majority of sources of information are unknown and with low Twitter reputation (less number of followers). This highlights the difficulty in measuring credibility of information and the need to develop automated mechanisms to assess credibility of information on Twitter.
Relevance Ranking in Web:
Ranking techniques have been used widely to rank URLs, content and users on various Web 2.0 platforms. Page et al. developed a PageRank algorithm for webpages on the Internet, they used the number of out-links and in-links of a webpage to calculate its relative relevance to a query. Duan et al. in their paper proposed a supervised learn ing approach for ranking tweets based on certain query inputs. They used content and non-content features (like authority of users) to rank tweets according to their relevance to a topic. Their work used Rank-SVM technique and extracted the best features that resulted in good ranking performance. The three prominent features were: whether a tweet contains URL, the length of tweet (number of characters), and authority of user account. Chen et al. built a tool called zerozero88, 6 which recommends URLs that a particular Twitter user might and interesting. They showed, how topic relevance and social voting parameters help in effective recommendations. Dong et al. worked on using inputs from Twitter to improve regency and relevance ranking for search engines using Gradient Boosted Decision Tree (GBDT) algorithm. They showed how in addition to existing features used to rank URLs on web, additional information from Twitter can be used to enhance the ranking of URLs on the Web.