19-04-2013, 04:55 PM
Ano´nimos: An LP-Based Approach for Anonymizing Weighted Social Network Graphs
Approach for Anonymizing.pdf (Size: 1.29 MB / Downloads: 26)
Abstract
The increasing popularity of social networks has initiated a fertile research area in information extraction and data mining.
Anonymization of these social graphs is important to facilitate publishing these data sets for analysis by external entities. Prior work
has concentrated mostly on node identity anonymization and structural anonymization. But with the growing interest in analyzing social
networks as a weighted network, edge weight anonymization is also gaining importance. We present Ano´nimos, a Linear
Programming-based technique for anonymization of edge weights that preserves linear properties of graphs. Such properties form the
foundation of many important graph-theoretic algorithms such as shortest paths problem, k-nearest neighbors, minimum cost spanning
tree, and maximizing information spread. As a proof of concept, we apply Ano´nimos to the shortest paths problem and its extensions,
prove the correctness, analyze complexity, and experimentally evaluate it using real social network data sets. Our experiments
demonstrate that Ano´nimos anonymizes the weights, improves k-anonymity of the weights, and also scrambles the relative ordering of
the edges sorted by weights, thereby providing robust and effective anonymization of the sensitive edge-weights. We also demonstrate
the composability of different models generated using Ano´nimos, a property that allows a single anonymized graph to preserve multiple
linear properties.
INTRODUCTION
SOCIAL networking sites such as MySpace, Facebook,
Twitter, and Orkut have millions of registered users,
and the resulting social graph structures have millions of
vertices (users or social actors) and edges (social associations).
Recent research has explored these social networks
for understanding their structure [1], [2], [3], criminal
intelligence [4], information discovery [5], advertising and
marketing [6], and others [7]. As a result, companies (such
as Facebook) hosting the data are interested in publishing
portions of the graphs so that independent entities can mine
the wealth of information contained in these social graphs.
Anonymization of these graphs is paramount to avoid
privacy breaches [8], [9]. Consequently, there has also been
considerable interest in the anonymization of graph
structured data [10], [11], [12], [13], [14], [15], [16]. But most
of the existing research on anonymization techniques tends
to focus on unweighted graphs for node and structural
anonymization.
ANO´ NIMOS IN ABSTRACT
We now introduce Ano´nimos, and use Kruskal’s algorithm
[21] for minimum spanning tree (MST) as an example. The
goal of this technique is to capture the dynamic behavior of
the algorithm using a system of linear inequalities. Given
the original weighted directed graph G ¼ ðV ;E;WÞ with
positive edge weights represented by variables x1; x2; . . . ;
xm (where each xi corresponds to an edge i ¼ ðu; vÞ 2 E;
refer to Table 1 for notational conventions), we model the
system of linear inequalities in terms of these variables. For
example, at every step of Kruskal’s algorithm for the MST
[21], the edge with the minimum weight among the set of
remaining edges, and not resulting in a cycle is added to the
MST. Let ðui; viÞ be the edge selected at the ith iteration, and
ðuiþ1; viþ1Þ be the edge selected in the ði þ 1Þth iteration.
Single Source Shortest Paths—Linear Model
We now show how Ano´nimos can be used for modeling the
single source shortest paths tree. Given a weighted graph
G ¼ ðV ;E;WÞ, and a source vertex v0, a single source shortest
paths tree is a spanning tree of the graph where the path
from the source to any other vertex in the tree is the shortest
path between the pair in G. This tree is important in a
number of applications; for example, if weights are
assigned based on inverse of “trustworthiness,” then this
tree will provide the paths with greatest “trustworthiness”
for transferring confidential information from a specific
node while minimizing chances of a leak.
OPTIMIZING THE MODELS
In the previous section, we developed a couple of simple
models for the shortest paths problem, and demonstrated
the composability of the models. We now provide optimizations
to the simple models to reduce the complexity of
the models while relaxing the composability property of the
models—composability of the optimized models require
special handling which we also discuss later in the section.
5.1 Single Source Shortest Paths—Reduced Model
We exploit specific properties of shortest paths to reduce
the complexity of the naı¨ve application of Ano´nimos to
Dijkstra’s algorithm which resulted in the Linear model.
Note that even though Dijkstra’s algorithm tries to relax the
neighbors when processing a vertex, the ultimate goal is to
select an appropriate vertex for the next iteration, i.e., the
vertex with the smallest known cost from the source.
Category III inequalities model this information in an
efficient way, and hence ideally, only Category III inequalities
are needed. However, Category III inequalities only
include edges that are part of the shortest paths tree.
Therefore, if only Category III inequalities are considered in
the model, then only part of the total number of edges are
modeled. These inequalities by themselves do not put
constraints on nontree edges, and thus, if no care is taken
while reassigning edge weights in the anonymized graph, it
can lead to violations of the order in the anonymized graph.
For instance, if edge ðu; vÞ is a nontree edge, then a model
using only Category III would not impose any constraint on
ðu; vÞ. Hence, a reassignment of weights in the anonymized
graph might assign the edge ðu; vÞ a weight such that
Dijkstra’s algorithm executing on the anonymized graph
selects ðu; vÞ as a tree edge.
All Pairs Shortest Paths—Optimized Model
We now develop an efficient model for the all pairs shortest
paths problem. The model obtained by composition of the
Linear model, though correct, has many redundant inequalities.
For example, edges that are not part of any of the trees
need not be part of the model, and can be treated as the
nontree edges in Section 5.1. However in the described
model, there are no means for filtering out these inequalities.
We now delve deeper into the problem and show how
the Reduced model described in Section 5.1 can be composed
for the all pairs problem.
Application Specific Properties
In addition to the properties described in Section 3.1 which
are general to Ano´nimos, there are some additional
properties which are interesting for the shortest paths
problem. While generating the constraints for the model,
the algorithm can be terminated at any point prior to
completion, and this early termination has specific applications
for the shortest paths problem. Formulating Ano´nimos
as an LP problem ensures that the model being
constructed is consistent at every point during the execution,
and hence, these interesting subproperties are also
preserved. Furthermore, since the algorithm does not
process every node and vertex in the graph, this leads to
considerable savings in the complexity.
Graph Data Sets
Mislove et al. [3] crawled a number of social network sites
for analyzing the properties of these large social graphs,
and have made their data sets publicly available. Their data
sets include the graphs for a number of popular social
networking sites: Flickr (www.flickr.com), LiveJournal
(www.livejournal.com), Orkut (www.orkut.com), and Youtube
(www.youtube.com). While Orkut is a pure social
networking site, LiveJournal (referred to as LJ in the data
sets) is a blogging site whose users form a social network,
while Flickr and Youtube are photo sharing and video
sharing sites, respectively, with an overlayed social network
structure among its users. We model the graphs of these
networks as directed graphs where edges have positive
weights, but the models can be extended for undirected
graphs. The published graph data sets are unweighted.
Since our model is not dependent on the semantics of the
weights or their magnitude, we assigned randomly generated
weights to the edges of the graph. We experimented by
selecting weights randomly from a uniform distribution
(real numbers in the range 1 to 100) as well as normal
distribution (with mean 50 and standard deviation), but no
observable differences were observed in the number of
inequalities as well as the privacy protection measures. The
experiments presented in this paper are using weights
selected randomly from a uniform distribution.
RELATED WORK
The need to protect the privacy of social entities involved in
social networks has given rise to active research in
anonymization techniques for social network graphs. This
interest has been primarily driven by the findings of
Backstrom et al. [8] and Korolova et al. [9]. Backstrom et al.
[8] described a technique based on the structural properties
of graphs such as isomorphism and automorphism to
reidentify vertices in the anonymized graph. Their technique
was based on implanting unique structures in the graph
which can be reidentified in the anonymized graph with
very high probability. On the other hand, Korolova et al. [9]
devised an attack where a node can be reidentified based in
part on background information regarding the neighborhood.
As a result, a lot of research has focused on node
identity anonymization and structural anonymization. A comprehensive
survey about the various anonymization techniques
is provided in [32], [33].
CONCLUSION
Anonymization of edge weights in a social network graph is
important to enable the analysis and mining of social
graphs by computer scientists as well as social scientists.
Such mining has significant impact on the management of
social networks as well as the understanding of various
social behaviors. We proposed Ano´nimos, a technique for
the effective anonymization of weighted social network
graphs by modeling linear properties and formulating them
as an LP problem. The Ano´nimos approach can be applied
to preserve linear properties by generation of inequalities
corresponding to decisions made by the algorithm during
its execution. As a proof of concept, we considered the
shortest paths problem and showed how off-the-shelf LP
packages can be used to effectively anonymize the graphs.
The composability of Ano´nimos for preserving multiple
properties in a single anonymized graph was demonstrated
using the all pairs shortest paths problem.