Assigning Trust to Wikipedia Content pdf

**project girl** · 07-02-2013, 04:27 PM

Assigning Trust to Wikipedia Content

.pdf

1Assigning Trust.pdf (Size: 257.23 KB / Downloads: 13)

ABSTRACT

The Wikipedia is a collaborative encyclopedia: anyone can contribute
to its articles simply by clicking on an “edit” button. The
open nature of the Wikipedia has been key to its success, but has
also created a challenge: how can readers develop an informed opinion
on its reliability? We propose a system that computes quantitative
values of trust for the text in Wikipedia articles; these trust
values provide an indication of text reliability.
The system uses as input the revision history of each article, as
well as information about the reputation of the contributing authors,
as provided by a reputation system. The trust of a word in an article
is computed on the basis of the reputation of the original author
of the word, as well as the reputation of all authors who edited
text near the word. The algorithm computes word trust values that
vary smoothly across the text; the trust values can be visualized using
varying text-background colors. The algorithm ensures that all
changes to an article’s text are reflected in the trust values, preventing
surreptitious content changes.

INTRODUCTION

Wikipedia is an online encyclopedia who grew in the span of a
few years to become one of the most widely used sources of information
on the web. Wikipedia owes its growth and breadth of
coverage to its ability to harness the contributions of millions of
individuals, ranging from casual visitors, to domain experts, to dedicated
editors. On the other hand, the open process that gives rise
to Wikipedia content makes it difficult for visitors to form an idea
of the reliability of the content. Wikipedia articles are constantly
changing, and the contributors range from domain experts, to vandals,
to dedicated editors, to superficial contributors not fully aware
of the quality standards the Wikipedia aspires to attain. Wikipedia
visitors are presented with the latest version of each article they
visit: this latest version does not offer them any simple insight into
how the article content has evolved into its most current form, nor
does it offer a measure of how much the content can be relied upon.
These considerations generated interest in algorithmic systems for
estimating the trust of Wikipedia content [21, 34].

The Trust Assignment Algorithm

The goal of our trust system is to convey information on the degree
with which the text has been revised, and to flag any recent
unchecked content modifications. We rely on a simple idea: the
trust of text should depend on the reliability of the author, and on
the reliability of the people who subsequently revised, checked, and
edited the text [21, 34].
As a measure of author and revisor quality, we take the author
reputation computed by the author reputation system of [1]. That
reputation system, like the trust system described in this paper, is
content-driven: it relies on content analysis, rather than user-to-user
feedback. Users who contribute long-lived content gain reputation,
while users who contribute content that is quickly removed lose reputation.
The resulting author reputation was shown to correlate well
with the quality of the author’s future contributions, justifying its
use in the computation of text trust.

Trust Quality Metrics

The trust values are computed from the past history of text, and
reflect the degree with which text has been edited and revised. Ideally,
we would like to show that high trust text conveys with high
probability correct information. However, correctness is very difficult
to define and measure. As a substitute, we study the correlation
between trust, and future text stability, in the hypothesis that correct
(or high-quality) content is less likely to be revised [34]. The
quality metrics will also provide quantitative performance indices
that will be useful in fine-tuning the behavior of the algorithms. We
note that the quality metrics capture only in part the intent underlying
our trust system: in particular, the goals of predicting future text
stability, and warning readers about recent modifications, do not always
coincide, as we will see in more detail later. Nevertheless, the
metrics offer valuable insight in the performance of the system.

Related Work

The problem of the reliability of Wikipedia content has often
emerged both in the press (see, e.g., [27, 12]) and in scientific journals
[8]. The idea of assigning trust to specific sections of text of
Wikipedia articles as a guide to readers has been previously proposed
in [21, 4, 34], as well as in white papers [14] and blogs [20];
these papers also contain the idea of using text background color to
visualize trust values.
The work most closely related to ours is [34], where the trust of
a piece of text is computed from the Wikipedia roles (anonymous,
registered user, or editor) of the original author, and of the authors
who subsequently revised the article. The Wikipedia roles of authors
are thus used in lieu of author reputation; as a consequence,
the algorithm can only be applied to wikis where authors are organized
in a well-defined hierarchy. Text analysis is performed at
the granularity level of sentences; all sentences introduced in the
same revision form a fragment, and share the same trust. A change
anywhere in a sentence causes the whole sentence to be considered
new, and the position of the change in the sentence is not flagged
via the trust labeling.

IMPLEMENTATION

We have implemented a trust tool that computes text trust and
provenance for theWikipedia. The trust tool takes as input an XML
dump containing all the text of all the revisions of the Wikipedia;
such dumps are periodically made available from the Wikimedia
Foundation. The trust tool is written in Ocaml [16]; we chose this
language for its combination of speed and excellent memory management.
On an Intel Core 2 Duo 2 GHz CPU, our tool is capable
of assigning trust to versions of Wikipedia articles2 at over 15 versions/
second, or roughly 1.5 millions versions per day, an edit rate
much higher than the one of the on-line Wikipedia [32]. We have
run the trust tool over the entire English Wikipedia, as of its February
6, 2007 dump; the results can be viewed on a live demo [29]. To
save disk space on the server, the demo contains only the last 100
versions of each article, but all versions were considered in trust
computation.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	HOW EMAIL WORKS pdf	project girl	1	3,067	20-09-2017, 11:39 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	Review: Context Aware Tools for Smart Home Development pdf	study tips	1	1,227	20-09-2017, 11:22 AM Last Post: jaseela123
	Getting Started with the MAXQ1103 Evaluation Kit and the CrossWorks Compiler pdf	project girl	1	969	15-09-2017, 03:11 PM Last Post: jaseela123
	Wireless Application Protocol (WAP) pdf	project girl	1	1,531	15-09-2017, 02:42 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	Wireless Automotive Communications pdf	seminar projects maker	1	637	14-09-2017, 01:27 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Internetworking connectionless and connection-oriented networks pdf	project girl	1	1,151	13-09-2017, 11:03 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.