23-11-2012, 05:51 PM
The CoQUOS Approach to Continuous Queries in Unstructured Overlays
The CoQUOS Approach.docx (Size: 2.13 MB / Downloads: 45)
Abstract:
The current peer-to-peer (P2P) content distribution systems are constricted by their simple on-demand content discovery mechanism. The utility of these systems can be greatly enhanced by incorporating two capabilities, namely a mechanism through which peers can register their long term interests with the network so that they can be continuously notified of new data items, and a means for the peers to advertise their contents. Although researchers have proposed a few unstructured overlay-based publish-subscribe systems that provide the above capabilities, most of these systems require intricate indexing and routing schemes, which not only make them highly complex but also render the overlay network less flexible toward transient peers. This paper argues that for many P2P applications, implementing full-fledged publish-subscribe systems is an overkill. For these applications, we study the alternate continuous query paradigm, which is a best-effort service providing the above two capabilities. We present a scalable and effective middleware, called CoQUOS, for supporting continuous queries in unstructured overlay networks. Besides being independent of the overlay topology, CoQUOS preserves the simplicity and flexibility of the unstructured P2P network. Our design of the CoQUOS system is characterized by two novel techniques, namely cluster-resilient random walk algorithm for propagating the queries to various regions of the network and dynamic probability-based query registration scheme to ensure that the registrations are well distributed in the overlay. Further, we also develop effective and efficient schemes for providing resilience to the churn of the P2P network and for ensuring a fair distribution of the notification load among the peers. This paper studies the properties of our algorithms through theoretical analysis. We also report series of experiments evaluating the effectiveness and the costs of the proposed schemes.
INTRODUCTION
Despite their popularity, most of the current unstructured P2P content distribution systems suffer from certain serious limitations. One such limitation is their simple, on demand mechanism for content discovery. Peers in these systems discover data items by circulating queries within the overlay network. A peer receiving a query responds back to the initiating node if it has any matching content. Upon processing a query, the recipient node removes it from its local buffers1. Thus, a query expires after it completes its circulation within the network. In other words, the network forgets the queries once they have completed their circulation. For clarity purposes, we call this the ad hoc query model, and we refer to the queries as ad hoc queries.
Continuous query is the means through which a peer can register its long term interests with the Co- QUOS system. A continuous query, represented as Q = (SID; Predicate; V Time), is essentially a tuple of three components, namely, source ID (SID) , query predicate (Predicate) and validity time (V Time). The source ID uniquely identi_es the peer issuing the query. The query predicate is the matching condition of the query, and is used by the source peer to specify its interests. In general, the predicate can be of any form such as range predicates or even a regular expression. We assume that the predicate is a list of keywords describing the content the source peer is interested in. Validity time (V Time) represents the time until which the source node is interested in receiving noti_cations.
Cluster Resilient Random Walk:
Random walk corresponds to a depth first traversal of the network, and a message propagated through random walks has a higher probability of reaching remote regions of the network than its flooding-based counterpart. In this paper we use the terms random walk and pure random walk (PRW) interchangeably.
The above property of the random walk makes it an attractive paradigm for propagating continuous queries. Unfortunately, the random walk protocol suffers from one significant drawback that undermines its utility for propagating queries in the CoQUOS system.
we have designed a novel query dissemination scheme called cluster resilient random walk (CRW). This scheme is motivated by a crucial observation: Two peers belonging to the same cluster generally have large numbers of common neighbors.
Dynamic Probability Scheme:
The CRW scheme provides a mechanism for propagating a continuous query. But, how does a node receiving this message decide whether to register the query? A straightforward solution would be to register a query at every node it visits. However, this would result in large numbers of unnecessary subscriptions, which affects the efficiency of the network.
The reason is that for some continuous queries a long series of peers in the path of the query message may all decide not to register the query, whereas another sequence of consecutive nodes may all decide to host the query. The announcements originated near the dry patches of a query's path might fail to reach any of its beacon nodes, thus leading to low success rates. Considering these requirements, we have designed a novel dynamic probability-based technique (DP scheme, for short) for peers to decide whether to register a continuous query. However, the registration probability of a query varies as the query traverses along its route.
Passive Replication:
We discuss a passive replication-based scheme for preserving high notification effectiveness of the system even when the underlying P2P network experiences significant churn.
This churn of the overlay network can adversely impact the success of continuous queries and announcements. When a node Pi gracefully leaves the system, it asks one of its neighbors to handle all registered queries at Pi and also notifies all the beacon nodes with queries issued by Pi to remove the queries. However, when Pi exits the system unexpectedly, all the registrations are lost and the notification success rates of the respective queries and the matching announcements drop. Thus, effective mechanisms are needed to alleviate the negative effects of churn in the overlay network.
Overlay Churn:
In order to counter the adverse effects of network churn, we have designed a low-cost technique wherein the query registrations present on a peer are replicated on one or more of its neighbors. Failures are detected through periodic exchange of heartbeat messages between the beacon node and the peers maintaining its replicas. The beacon node that does not respond to two consecutive messages is assumed to have failed. In the interest of better load distribution, two or more neighbors may takeover subsets of the queries registered at the failed node. The communication costs of maintaining query replicas are optimized through lazy replication and piggybacking the information they utilize. They only require interactions between neighboring peers, thus making them suitable for generic unstructured P2P networks. In both schemes, neighboring peers periodically (at the end of pre-specified cycles) exchange information about their loads. Based on the load information obtained from its neighbors, a peer decides whether it is overloaded