02-03-2013, 04:12 PM
Pastry Network
Pastry Network.docx (Size: 68.23 KB / Downloads: 29)
INTRODUCTION
Pastry is an overlay and routing network for the implementation of a distributed hash table (DHT) similar to Chord. The key-value pairsare stored in a redundant peer-to-peer network of connected Internet hosts. The protocol is bootstrapped by supplying it with the IP address of a peer already in the network and from then on via the routing table which is dynamically built and repaired. Because of its redundant and decentralized nature there is no single point of failure and any single node can leave the network at any time without warning and with little or no chance of data loss. The protocol is also capable of using a routing metric supplied by an outside program, such as ping or traceroute, to determine the best routes to store in its routing table.
Overview
Although the distributed hash table functionality of Pastry is almost identical to other DHTs, what sets it apart is the routing overlay network built on top of the DHT concept. This allows Pastry to realize the scalability and fault tolerance of other networks, while reducing the overall cost of routing a packet from one node to another by avoiding the need to flood packets. Because the routing metric is supplied by an external program based on the IP address of the target node, the metric can be easily switched to shortest hop count, lowest latency, highest bandwidth, or even a general combination of metrics.
The hash table's key-space is taken to be circular, like the key-space in the Chord system, and node IDs are 128-bit unsigned integers representing position in the circular key-space. Node IDs are chosen randomly and uniformly so peers who are adjacent in node ID are geographically diverse. The routing overlay network is formed on top of the hash table by each peer discovering and exchanging state information consisting of a list of leaf nodes, a neighborhood list, and a routing table. The leaf node list consists of the L/2 closest peers by node ID in each direction around the circle.
In addition to the leaf nodes there is also the neighborhood list. This represents the M closest peers in terms of the routing metric. Although it is not used directly in the routing algorithm, the neighborhood list is used for maintaining locality principals in the routing table.
Routing
A packet can be routed to any address in the keyspace whether there is a peer with that node ID or not. The packet is routed toward its proper place on the circular ring and the peer whose node ID is closest to the desired destination will receive the packet. Whenever a peer receives a packet to route or wants to send a packet it first examines its leaf set and routes directly to the correct node if one is found. If this fails, the peer next consults its routing table with the goal of finding the address of a node which shares a longer prefix with the destination address than the peer itself. If the peer does not have any contacts with a longer prefix or the contact has died it will pick a peer from its contact list with the same length prefix whose node ID is numerically closer to the destination and send the packet to that peer. Since the number of correct digits in the address always either increases or stays the same — and if it stays the same the distance between the packet and its destination grows smaller — the routing protocol converges.
Applications built on Pastry
Pastry itself specifies how keys are distributed among the nodes and how the node responsible for holding a key can be found. Using this as a substrate for a higher protocol enables Pastry to implement functionality such as a distributed file system, a subscription and publishing system, or any other system which can be reduced to storing values and retrieving them later.
PAST
PAST is a distributed file system layered on top of Pastry. A file is stored into the system by computing the hash of its filename. Then Pastry routes the contents of the file to the node in the circular keyspace closest to the hash obtained from the filename. This node will then send copies of the file to the k nodes nearest the actual key, most of which are likely to be leaf nodes of this node and thus directly reachable. Retrieval of data is accomplished by rehashing the file name and routing a request for the data over Pastry to the proper place in the keyspace. The request can be fulfilled by any of the k nodes that have copies of the data. This accomplishes both data redundancy and load distribution. Since adjacent nodes in the keyspace are geographically diverse the odds that all k of them will go offline at the same time is very small. More importantly, since the Pastry routing protocol seeks to minimize the distance traveled, the nearest node to the machine that made the request (according to the metric) is likely to be the one that responds with the data.