09-08-2012, 10:33 AM
DCMP: A Distributed Cycle Minimization Protocol for Peer-to-Peer Networks
1DCMP A Distributed.pdf (Size: 2.81 MB / Downloads: 45)
Abstract
Broadcast-based peer-to-peer (P2P) networks, including flat (for example, Gnutella) and two-layer superpeer
implementations (for example, Kazaa), are extremely popular nowadays due to their simplicity, ease of deployment, and versatility.
The unstructured network topology, however, contains many cyclic paths, which introduce numerous duplicate messages in the
system. Although such messages can be identified and ignored, they still consume a large proportion of the bandwidth and other
resources, causing bottlenecks in the entire network. In this paper, we describe the Distributed Cycle Minimization Protocol (DCMP), a
dynamic fully decentralized protocol that significantly reduces the duplicate messages by eliminating unnecessary cycles. As queries
are transmitted through the peers, DCMP identifies the problematic paths and attempts to break the cycles while maintaining the
connectivity of the network. In order to preserve the fault resilience and load balancing properties of unstructured P2P systems, DCMP
avoids creating a hierarchical organization. Instead, it applies cycle elimination symmetrically around some powerful peers to keep the
average path length small. The overall structure is constructed fast with very low overhead. With the information collected during this
process, distributed maintenance is performed efficiently even if peers quit the system without notification. The experimental results
from our simulator and the prototype implementation on PlanetLab confirm that DCMP significantly improves the scalability of
unstructured P2P systems without sacrificing their desirable properties. Moreover, due to its simplicity, DCMP can be easily
implemented in various existing P2P systems and is orthogonal to the search algorithms.
INTRODUCTION
PEER-TO-PEER (P2P) technology is attracting a lot of
attention since it simplifies the implementation of large
ad hoc distributed repositories of digital information. In a
P2P system, numerous nodes are interconnected and
exchange data or services directly with each other. There
are two major categories of P2P architectures: 1) Hashbased
systems (for example, CAN [1] and Chord [2]),
which assign a unique key to each file and forward queries
to specific nodes based on a hash function. Although they
guarantee locating content within a bounded number of
hops, they require tight control of the data placement and
the topology of the network. 2) Broadcast-based systems
(for example, Gnutella [3]), which use message flooding to
propagate queries. There is no specific destination; hence,
every neighbor peer is contacted and forwards the
message to its own neighbors until the message’s lifetime
expires. Such systems have been successfully deployed in
practice to form worldwide ad hoc networks due to their
simplicity and versatility.
RELATED WORK
Research in the P2P area was triggered by the success of
systems like Gnutella [3] and Kazaa [4]. Gnutella is a pure
P2P system that performs searching by Breadth-First
Traversal (BFT) of the nodes around the initiator peer. Each
peer that receives a query propagates it to all of its
neighbors up to a maximum of d hops. By exploring a
significant part of the network, it increases the probability
of satisfying the query. BFT, however, overloads the
network with unnecessary messages; moreover, slow peers
become bottlenecks. To overcome these problems, Kazaa
implements a two-layer network. The upper layer contains
powerful peers, called superpeers (or ultrapeers); slower
peers connect only to superpeers. The upper layer forms a
Gnutella-like network among superpeers. Searching is
performed by BFT at the upper layer only, since superpeers
contain the indices of their children. Reference [9] contains a
detailed analysis of such configurations.
Simplistic Cycle Elimination (SCE)
To motivate our approach, here, we describe a straightforward
method for eliminating cycles and explain its drawbacks.
Consider Fig. 3a and let peer B receive the same
message msg from A and A0. B identifies msg as a duplicate
by searching its GUID in the history table. Both the
direction A ! B of the first msg (which is recorded in the
table) and the direction of the duplicate msg, A0 ! B, are
parts of a cycle. A simplistic approach is to disable either
connection AB or A0B in order to eliminate the cycle.
This approach, however, is prone to problems when
multiple nodes in a cycle perform this cycle elimination
operation simultaneously. Consider a different case, where
nodes C and D receive duplicates and decide to eliminate
the cycle at the same time by disabling CE and DE,
respectively; then, regions 1 and 2 will be disconnected. The
reduced connectivity has a negative effect on response time
and on the ability of returning enough results. One way to
tackle this problem is to force the disconnected pair of peers
to continue exchanging information frequently about each
other’s status and reconnect if necessary. Obviously, this
poses a considerable overhead on the network.
DCMP: Distributed Cycle Minimization Protocol
In contrast to SCE, our protocol requires negotiation among
all peers involved in a cycle about the optimal way to cut
the cycle. Therefore, the probability of generating a
disconnected network is minimized. The negotiation process
is efficient, requiring only two messages per peer per
cycle. Also, the information gathered during negotiation is
used to repair the network with low overhead when peers
join or fail/quit without notification.
The negotiation process can be initiated by any peer that
receives a duplicate. Fig. 3b provides an example. Assume
that peer A receives a message msg from B ! A, and, soon
after, it receives the same message2 (that is, same GUID)
from F ! A. Peer A identifies msg as a duplicate by
performing a lookup in its history table. The first step of our
protocol is to gather information from all peers in the cycle.
To achieve this, we introduce a new type of control
message, called Information Collecting (IC) message.
Deciding the Cutting Position
Here, we explain how we choose the connection to disable
in order to cut a cycle. This decision is made at the peer that
receives two copies of the same IC message (that is, D in our
example). This peer is the coordinator; in DCMP, any peer
can act as coordinator. A straightforward way is to
eliminate randomly one edge of the cycle. However, our
experiments indicate that this approach does not preserve
the connectivity of the network. In order to achieve better
results, we rely on the properties of the peers in the cycle.
Recall that the IC messages that arrive at the coordinator
have gathered this information.
GatePeer Departure
All peers, including GatePeers, receive tagged messages
periodically; therefore, they have a list of nearby GatePeers
(recall that transitive peers are also handled as GatePeers).
From this information, a GatePeer G knows its distance to
each of the nearby GatePeers. Taking into account the
distance and power of these GatePeers, G generates an
ordered list of backup GatePeers. Then, G broadcasts this list
to its direct neighbors (that is, only 1 hop away). The
guideline for selection is that the backup GatePeers should
be powerful enough to accept the direct neighbors of G in
case G quits. In our experiments, we found that two to five
backup GatePeers were usually selected, depending on the
degree of G and the capacity of its neighboring GatePeers.
To maintain the backup list up to date, backup GatePeers
selection is performed periodically, and information broadcasting
is only needed when there is an update.
Topology and Workload Analysis
In the first set of experiments, we generate a power-law
network with 3,000 peers and count the number of
duplicate messages before DCMP starts to eliminate cycles.
Then, we allow DCMP to reach a stable state and count the
number of duplicate messages again. In Fig. 10a, we show
the number of duplicate messages after eliminating cycles
by using different TTLd values. Recall that TTLd guides the
process of eliminating the cycles that are shorter than
certain lengths. Therefore, cycles that have more that 2
TTLd edges are largely maintained. TTLd ¼ All will
eliminate all cycles causing the network to degenerate to a
tree. From the graph, we observe that the number of
duplicate messages is reduced considerably for TTLd ¼ 2
(that is, more than 90 percent of the duplicate messages are
eliminated). Further increasing TTLd does not result in a
significant improvement.
CONCLUSIONS
In this paper, we presented DCMP, a protocol for distributed
cycle minimization in broadcast-based P2P systems. DCMP
preserves the low diameter of Gnutella-like networks while
eliminating most of the duplicate messages. Moreover, the
overhead due to control messages is minimal. This results in
reduced response time, which, in turn, increases the
scalability of the system. Our protocol is suitable for dynamic
networks since it handles peer joins/departures efficiently
and is resilient to failures. DCMP is also designed to be as
simple as possible and is independent of the search
algorithm. Therefore, it can be implemented on top of
popular P2P systems such as Gnutella, Kazaa, or Gia with
minimal effort. We used a simulator and a prototype
implementation on PlanetLab to verify that our techniques
are applicable to realistic environments.
1DCMP A Distributed.pdf (Size: 2.81 MB / Downloads: 45)
Abstract
Broadcast-based peer-to-peer (P2P) networks, including flat (for example, Gnutella) and two-layer superpeer
implementations (for example, Kazaa), are extremely popular nowadays due to their simplicity, ease of deployment, and versatility.
The unstructured network topology, however, contains many cyclic paths, which introduce numerous duplicate messages in the
system. Although such messages can be identified and ignored, they still consume a large proportion of the bandwidth and other
resources, causing bottlenecks in the entire network. In this paper, we describe the Distributed Cycle Minimization Protocol (DCMP), a
dynamic fully decentralized protocol that significantly reduces the duplicate messages by eliminating unnecessary cycles. As queries
are transmitted through the peers, DCMP identifies the problematic paths and attempts to break the cycles while maintaining the
connectivity of the network. In order to preserve the fault resilience and load balancing properties of unstructured P2P systems, DCMP
avoids creating a hierarchical organization. Instead, it applies cycle elimination symmetrically around some powerful peers to keep the
average path length small. The overall structure is constructed fast with very low overhead. With the information collected during this
process, distributed maintenance is performed efficiently even if peers quit the system without notification. The experimental results
from our simulator and the prototype implementation on PlanetLab confirm that DCMP significantly improves the scalability of
unstructured P2P systems without sacrificing their desirable properties. Moreover, due to its simplicity, DCMP can be easily
implemented in various existing P2P systems and is orthogonal to the search algorithms.
INTRODUCTION
PEER-TO-PEER (P2P) technology is attracting a lot of
attention since it simplifies the implementation of large
ad hoc distributed repositories of digital information. In a
P2P system, numerous nodes are interconnected and
exchange data or services directly with each other. There
are two major categories of P2P architectures: 1) Hashbased
systems (for example, CAN [1] and Chord [2]),
which assign a unique key to each file and forward queries
to specific nodes based on a hash function. Although they
guarantee locating content within a bounded number of
hops, they require tight control of the data placement and
the topology of the network. 2) Broadcast-based systems
(for example, Gnutella [3]), which use message flooding to
propagate queries. There is no specific destination; hence,
every neighbor peer is contacted and forwards the
message to its own neighbors until the message’s lifetime
expires. Such systems have been successfully deployed in
practice to form worldwide ad hoc networks due to their
simplicity and versatility.
RELATED WORK
Research in the P2P area was triggered by the success of
systems like Gnutella [3] and Kazaa [4]. Gnutella is a pure
P2P system that performs searching by Breadth-First
Traversal (BFT) of the nodes around the initiator peer. Each
peer that receives a query propagates it to all of its
neighbors up to a maximum of d hops. By exploring a
significant part of the network, it increases the probability
of satisfying the query. BFT, however, overloads the
network with unnecessary messages; moreover, slow peers
become bottlenecks. To overcome these problems, Kazaa
implements a two-layer network. The upper layer contains
powerful peers, called superpeers (or ultrapeers); slower
peers connect only to superpeers. The upper layer forms a
Gnutella-like network among superpeers. Searching is
performed by BFT at the upper layer only, since superpeers
contain the indices of their children. Reference [9] contains a
detailed analysis of such configurations.
Simplistic Cycle Elimination (SCE)
To motivate our approach, here, we describe a straightforward
method for eliminating cycles and explain its drawbacks.
Consider Fig. 3a and let peer B receive the same
message msg from A and A0. B identifies msg as a duplicate
by searching its GUID in the history table. Both the
direction A ! B of the first msg (which is recorded in the
table) and the direction of the duplicate msg, A0 ! B, are
parts of a cycle. A simplistic approach is to disable either
connection AB or A0B in order to eliminate the cycle.
This approach, however, is prone to problems when
multiple nodes in a cycle perform this cycle elimination
operation simultaneously. Consider a different case, where
nodes C and D receive duplicates and decide to eliminate
the cycle at the same time by disabling CE and DE,
respectively; then, regions 1 and 2 will be disconnected. The
reduced connectivity has a negative effect on response time
and on the ability of returning enough results. One way to
tackle this problem is to force the disconnected pair of peers
to continue exchanging information frequently about each
other’s status and reconnect if necessary. Obviously, this
poses a considerable overhead on the network.
DCMP: Distributed Cycle Minimization Protocol
In contrast to SCE, our protocol requires negotiation among
all peers involved in a cycle about the optimal way to cut
the cycle. Therefore, the probability of generating a
disconnected network is minimized. The negotiation process
is efficient, requiring only two messages per peer per
cycle. Also, the information gathered during negotiation is
used to repair the network with low overhead when peers
join or fail/quit without notification.
The negotiation process can be initiated by any peer that
receives a duplicate. Fig. 3b provides an example. Assume
that peer A receives a message msg from B ! A, and, soon
after, it receives the same message2 (that is, same GUID)
from F ! A. Peer A identifies msg as a duplicate by
performing a lookup in its history table. The first step of our
protocol is to gather information from all peers in the cycle.
To achieve this, we introduce a new type of control
message, called Information Collecting (IC) message.
Deciding the Cutting Position
Here, we explain how we choose the connection to disable
in order to cut a cycle. This decision is made at the peer that
receives two copies of the same IC message (that is, D in our
example). This peer is the coordinator; in DCMP, any peer
can act as coordinator. A straightforward way is to
eliminate randomly one edge of the cycle. However, our
experiments indicate that this approach does not preserve
the connectivity of the network. In order to achieve better
results, we rely on the properties of the peers in the cycle.
Recall that the IC messages that arrive at the coordinator
have gathered this information.
GatePeer Departure
All peers, including GatePeers, receive tagged messages
periodically; therefore, they have a list of nearby GatePeers
(recall that transitive peers are also handled as GatePeers).
From this information, a GatePeer G knows its distance to
each of the nearby GatePeers. Taking into account the
distance and power of these GatePeers, G generates an
ordered list of backup GatePeers. Then, G broadcasts this list
to its direct neighbors (that is, only 1 hop away). The
guideline for selection is that the backup GatePeers should
be powerful enough to accept the direct neighbors of G in
case G quits. In our experiments, we found that two to five
backup GatePeers were usually selected, depending on the
degree of G and the capacity of its neighboring GatePeers.
To maintain the backup list up to date, backup GatePeers
selection is performed periodically, and information broadcasting
is only needed when there is an update.
Topology and Workload Analysis
In the first set of experiments, we generate a power-law
network with 3,000 peers and count the number of
duplicate messages before DCMP starts to eliminate cycles.
Then, we allow DCMP to reach a stable state and count the
number of duplicate messages again. In Fig. 10a, we show
the number of duplicate messages after eliminating cycles
by using different TTLd values. Recall that TTLd guides the
process of eliminating the cycles that are shorter than
certain lengths. Therefore, cycles that have more that 2
TTLd edges are largely maintained. TTLd ¼ All will
eliminate all cycles causing the network to degenerate to a
tree. From the graph, we observe that the number of
duplicate messages is reduced considerably for TTLd ¼ 2
(that is, more than 90 percent of the duplicate messages are
eliminated). Further increasing TTLd does not result in a
significant improvement.
CONCLUSIONS
In this paper, we presented DCMP, a protocol for distributed
cycle minimization in broadcast-based P2P systems. DCMP
preserves the low diameter of Gnutella-like networks while
eliminating most of the duplicate messages. Moreover, the
overhead due to control messages is minimal. This results in
reduced response time, which, in turn, increases the
scalability of the system. Our protocol is suitable for dynamic
networks since it handles peer joins/departures efficiently
and is resilient to failures. DCMP is also designed to be as
simple as possible and is independent of the search
algorithm. Therefore, it can be implemented on top of
popular P2P systems such as Gnutella, Kazaa, or Gia with
minimal effort. We used a simulator and a prototype
implementation on PlanetLab to verify that our techniques
are applicable to realistic environments.