15-02-2013, 02:13 PM
Saturn: Range Queries, Load Balancing and Fault Tolerance in DHT Data Systems
1Saturn Range Queries.docx (Size: 33.58 KB / Downloads: 79)
Abstract
In this paper, we present Saturn, an overlay architecture for large-scale data networks maintained over Distributed HashTables (DHTs) that efficiently processes range queries and ensures access load balancing and fault-tolerance. Placing consecutivedata values in neighboring peers is desirable in DHTs since it accelerates range query processing; however, such a placement is highly susceptible to load imbalances. At the same time, DHTs may be susceptible to node departures/failures and high data availability and fault tolerance are significant issues. Saturn deals effectively with these problems through the introduction of a novel multiple ring, order-preserving architecture. The use of a novel order-preserving hash function ensures fast range query processing. Replication across and within data rings (termed vertical and horizontal replication) forms the foundation over which our echanisms are developed, ensuring query load balancing and fault tolerance, respectively. Our detailed experimentation study shows strong gains in range query processing efficiency, access load balancing, and fault tolerance, with low replication overheads. The significance of Saturn is not only that it effectively tackles all three issues together—i.e., supporting range queries, ensuring load balancing, and providing fault tolerance over DHTs—but also that it can be applied on top of any order-preserving DHT enabling it to dynamically handle replication and, thus, to trade off replication costs for fair load distribution and fault tolerance.
Introduction
The tremendous growth in World Wide Web usage has become a double-edged sword for operators of large Web Sites. On the one hand, increases in request volume translate into increased subscription, advertising, and hosting revenue. On the other hand, scaling web sites to meet this increased demand has become more and more difficult as the number of requests for content exceed a particular server's ability to respond. In the best case, users will experience degraded service, in the worst case the server can be driven to collapse resulting in a complete loss of service.
One approach to alleviate handling of large volume of requests is to distribute their load among a group of nearly identical servers. A master controller, that can be a dedicated host or a process, first receives the requests and delegates it to the appropriate real server. This describes a typical load balancing system. A content switch is such a load balancing system that distributes load based on the content of the received requests.
Existing System:
STRUCTURED peer-to-peer (P2P) systems have provided the P2P community with efficient and combined routing and location primitives. This goal is accomplished by maintaining a structure in the system, emerging by the way that peers define their neighbors. Different structures have been proposed, most popular of which being
1) distributed hash tables or DHTs (CAN [31], Pastry [32], Chord [35], Tapestry [38]), using hashing schemes to map peers and data keys to a single, circular identifier space, and
2) distributed balanced trees (P-Grid [1], PHT [30], BATON [21], etc.), where nodes are placed in a tree-based topology.
One of the biggest shortcomings of DHTs, having spurred considerable research, is that they only support exact-match queries. Therefore, the naive approach to deal with range queries over DHTs—i.e., to individually query each value in the range—would be greatly inefficient and thus inapplicable in most cases. Although there are many research papers that support range queries over DHTs more “cleverly” and, thus, more efficiently [3], [17], [33], [36], all of them suffer from access load imbalances in the presence of skewed data access distributions. Only a few approaches deal with both problems, i.e., load balancing and efficient range query processing in DHTs [7], [15], [1] or other structures [4], [14], [21]. However, these solutions are based on data migration which is often inadequate in skewed data access distributions since transferring, for instance, a single popular value from a heavy loaded peer to another simply transfers the problem. In such cases, access load balancing is best addressed using replication of popular values to distribute the access load among the peers storing such replicas.
Proposed System :
Specifically, we develop a multi ring order preserving architecture on top of an order-preserving DHT overlay.
The proposed architecture consists of:
1. A novel multiring hash function used for data placement on top of the order-preserving DHT ring, both preserving the order of values and handling value replication in multiple data rings.
2. A load-driven replication scheme which dynamically replicates popular data across rings to ensure access load balancing. This scheme also supports tunable replication: by tweaking the maximum degree of replication, a system parameter, and per node load thresholds, it trades off replication costs for access load balancing.
3. A fault tolerance mechanism which detects peer failures and retrieves lost data. An additional replication scheme is proposed, the horizontal k replication, which both guarantees data availability and efficiency of range query processing, even with high-failure probabilities (fpr).
We coin this architecture Saturn1 because of its multiple data rings created by replication. Saturn employs load driven replication (visualized as “vertical rings”) to ensure access-load balancing, and horizontal k-replication (visualized as “horizontal rings”) to ensure data availability and to also preserve the order of values to support efficient range query processing.