01-06-2013, 04:46 PM
On Optimizing Overlay Topologies for Search in Unstructured Peer-to-Peer Networks
On Optimizing Overlay.pdf (Size: 1.14 MB / Downloads: 26)
Abstract
Unstructured peer-to-peer (P2P) file-sharing networks are popular in the mass market. As the peers participating in
unstructured networks interconnect randomly, they rely on flooding query messages to discover objects of interest and thus introduce
remarkable network traffic. Empirical measurement studies indicate that the peers in P2P networks have similar preferences, and have
recently proposed unstructured P2P networks that organize participating peers by exploiting their similarity. The resultant networks
may not perform searches efficiently and effectively because existing overlay topology construction algorithms often create
unstructured P2P networks without performance guarantees. Thus, we propose a novel overlay formation algorithm for unstructured
P2P networks. Based on the file sharing pattern exhibiting the power-law property, our proposal is unique in that it poses rigorous
performance guarantees. Theoretical performance results conclude that in a constant probability, 1) searching an object in our
proposed network efficiently takes Oðlnc NÞ hops (where c is a small constant), and 2) the search progressively and effectively exploits
the similarity of peers. In addition, the success ratio of discovering an object approximates 100 percent. We validate our theoretical
analysis and compare our proposal to competing algorithms in simulations. Based on the simulation results, our proposal clearly
outperforms the competing algorithms in terms of 1) the hop count of routing a query message, 2) the successful ratio of resolving a
query, 3) the number of messages required for resolving a query, and 4) the message overhead for maintaining and formatting the
overlay.
INTRODUCTION
PEER-TO-PEER (P2P) networks (or overlay networks) have
been widely deployed in the Internet, and they provide
various services such as file sharing, information retrieval,
media streaming, and telephony. P2P applications are
popular because they primarily provide low entry barriers
and self-scaling. Prior studies (e.g., [1], [2]) reveal that P2P
applications may dominate up to around 20 percent of
Internet traffic.
Object search is an essential building block in several
P2P applications (e.g., file sharing and information
retrieval). Gnutella [3] is a popular P2P search protocol
in the mass market. Specifically, because Gnutella networks
are unstructured, and the peers participating in
networks connect to one another randomly, peers search
objects in the networks through message flooding. To flood
a message, an inquiry peer broadcasts the message to its
neighbors (by the neighbors of peer i, we mean those peers
that have end-to-end connections with i). The broadcast
message is associated with a positive integer time-to-live
(TTL) value.
RELATED WORK
Li and Wu in [30] and Zhu and Hu in [31] conducted surveys
on the searching techniques in P2P networks. However, due
to space limitation, we mainly discuss P2P networks that aim
to exploit the similarity of participating peers.
pSearch [32] and SSW [33] are content-based P2P networks
providing semantic search. Similar to most P2P
networks based on distributed hash tables (e.g., Chord [34]), in
pSearch and SSW, each published object, which is represented
by a latent semantic vector [35], needs to be indexed
first into the network where the participating peers are
formatted in a well-structured manner and host a disjoint
key subspace. Therefore, the participating peers need to
maintain foreign indices, that is, the indices of objects stored
in remote peers. To locate an object, a requesting peer routes
a message toward the peer responsible for the key subspace
where the object is indexed. In contrast, in our study, we
present a construction of unstructured P2P networks where
the participating peers need not organize themselves into a
rigid, deterministic topology structure, considerably reducing
the maintenance overhead of overlay topology. Unlike
pSearch [32] and SSW [33], the peers in our proposed
network host the objects of interest and maintain no foreign
indices, eliminating storage and bandwidth overheads for
publishing and managing such indices.
Search Protocol
Consider any query Q requesting an object O. Denote the
peer issuing Q and the peer hosting O by s and d,
respectively. Instead of blindly flooding query messages
as in the typical Gnutella protocol [3], our search protocol
conceptually operates as follows. Among the neighboring
peers in Is and s, s selects the peers; each t 2 Is [ s has
Fðt; dÞ > Fðs; dÞ. s then broadcasts Q to the peers. Upon
receiving Q, t performs similarly by selecting its neighbors
who are more similar to d than t, hoping that there is at least
one overlay path P toward the destination peer d.
SIMULATIONS
Experimental Setup
We have developed an event-driven simulator to evaluate
the performance of our proposal. The input trace to our
simulator is the eDonkey data set [16]. The data set
maintains the files shared by peers participating in the
eDonkey file sharing network. Specifically, the files shared
by each peer are recorded in the data set. As the eDonkey
data set lacks the details for describing each shared file (e.g.,
the keyword metadata), we measure the similarity level
between any two peers u and v in the trace as the similarity
function Fðu; vÞ ¼ jOu\Ovj
jOu[Ovj , where Ou and Ov represent the
files shared by peers u and v, respectively. With such a
similarity function Fðu; vÞ, the corresponding peer similarity
graph (as we have discussed in Section 3.1) due to the trace
contains a maximally connected component approximating
14,000 peers (denoted by the peer set V ). Consequently, we
simulate a P2P network formed by these 14,000 peers.
SUMMARY AND FUTURE WORK
We have presented an unstructured P2P network with
rigorous performance guarantees to enhance search efficiency
and effectiveness. In a constant probability, a
querying peer takes Oðlnc NÞ hops (where c is a small
constant) to reach the destination node capable of resolving
the query, whereas the query messages can progressively
and effectively exploit the similarity of the peers. The query
can be successfully resolved in an approximate probability
of 100 percent. Notably, the theoretical analysis further
reveals that the competitive decentralized solutions (e.g.,
those in [21], [22], [23], [24]) do not perform well as the hop
count of routing a query message in such networks,
considering the exploitation of the similarity of participating
peers, is in the polynomial of system size N.