18-06-2012, 12:32 PM
An Efficient Search Algorithm without Memory for Peer-to-Peer
Cloud Computing Networks
An Efficient Search Algorithm without Memory for Peer-to-Peer.pdf (Size: 842.25 KB / Downloads: 34)
Abstract
In recent years, as the rapid development of the
technology about Peer-to-Peer (P2P) networks and the cloud
computing technology, various applications of P2P technology
become very widespread in most cloud computing distributed
network applications. P2P cloud computing networks are
unstructured and are an important component to implement
next generation internet. How to quickly and efficiently search the
resources in P2P networks has become one of the most critical
issues, and it is one of the greatest concerns to users. This paper *
first describes the basic flooding P2P network search method,
which is from the analyzing the pros and cons of several new
search methods. After that, a cache-based search algorithm is
proposed: When a node of the remaining load capacity is high, it
will become the center node and form a joint topology area with
the nearby nodes together, then the center node and ordinary
nodes also need to store the index cache. At the local region the
overheating resources will be copied to the local (that is, the
contents cache). The simulation results show that the algorithm
can effectively improve the hit rates of resources searching, and
reduce the query delay in cloud computing networks.
I. INTRODUCTION
In recent years, as the rapid development of the technology
about Peer-to-Peer (P2P) networks and the cloud
computing technology, various applications of P2P
technology become very widespread in most cloud
computing distributed network applications. P2P cloud
computing networks are unstructured and are an important
component to implement next generation internet. we can
use P2P as an alternative or to improve. It is subversion of
the traditional C/S mode data sharing model, data sharing
based on P2P technology has become very popular, and it
appears very easy for people's lives.
P2P-based data sharing [1] system can be divided into
three categories: mixed, non-structured and structured P2P
system. As a pioneer in P2P networks, Napster [2] are a
hybrid of P2P data sharing system, Napster site is a server
cluster, each server stored a part of the user's shared files
index information, all of the servers used to run for the
overall can be seen as a server. On the one hand, users
obtain the desired resource file from the server, use the
information to download the necessary documents, on the
other hand the users will be willing to share files with other
users and send the files information to the server, the server
record the information and the user's location. But such
systems are still centralized mechanism, there is scalability
bottleneck and single point error and other limitations;
Structured P2P data sharing systems such as Chord [3], the
data in the system is strictly controlled, as with the
distribution Hash table, the data strictly mapped to the node,
such systems have more efficient query routing, it ensure
that the routing hop of querying data objects no more than
O(logN). However, because of the topology maintenance
costs is expensive and it does not support complex and
fuzzy retrieval, resulting in its application is still not very
widespread at the present stage; Unstructured P2P systems,
such as Gnutellal [4] and KaZaA [4], there is neither any
centralized system, nor was there any the data to place and
control mechanisms. Independent autonomous nodes of
such systems, each node only need to maintain their own
resources, each node provide information services with
other nodes, they may also enjoy other nodes provides the
information and services, and maintenance of such systems
and deployment is easier to achieve, so in the Internet,
unstructured P2P systems are the most common.
The practicality of P2P networks relies on the data
location and extraction efficiency. How to efficiently search
for resources on the P2P network is one of the most critical
issues. From the user point of view, the algorithm is good
or bad will directly affect the quality of services (such as
time delay, the satisfaction of the results, etc.), and the user
experience. First, in P2P networks the resources are not in a
single server, but scattered in all of the other nodes;
Second, although the ideal of P2P network each node is
equal, in practice their ability to provide resources is varied
[5], node heterogeneity exists, the search methods need to
take into account whether the nodes should be given
different priorities and approach; In P2P network node
behaviors vary considerably, they are active in the P2P
network on time very different, which led to the dynamic
nature of the network, the search methods should adapted to
these changes.
In this paper, at the base of describing the basic peer
networks Flooding search method, some analysis of a
number of other representative of the search algorithm is
given, based on further improvement of these algorithms,
we proposed a Cache-based search algorithm. The
simulation shows that the algorithm can improve the hit
rate of search resources and reduce the query delay.
II. P2P NETWORK SEARCH ALGORITHM
P2P Internet traffic in the proportion of the total flow is
already very large [6], with the popularity of the
application, users’ demands on searching for resources
more and higher, various search methods are also emerged
in P2P networks, here are some typical P2P network search
algorithms.
A. Flooding algorithm
Flooding algorithm is the most simple and the most
direct routing method in unstructured P2P networks, but
also the most of the actual existing unstructured P2P
network approach, characterized by routes traverse the
whole network with the blind search. Nodes than a local
search will query broadcast sent to all the neighbors of its
nodes. Search by the query hops TTL given in the routing
process, the nodes through the TTL value to determine
whether the message expired. Check once every forward,
TTL minus 1, when TTL = 0 do not search for an end to the
resource query, the node stop dissemination of information;
If the search to match resources, then according to the path
to the target machine the original information return back
the message to query node. In order to prevent the
generation of loop, each query message has a unique
number. If the node has previously received a query, then
the circuit has occurred, the node will not continue to
spread query. Figure 1 shows the flooding routing
algorithm the, the query message first set TTL = 4, TTL
minus 1 per transmission time. The coverage of flood
routing method is a circle with TTL as the radius, the
network overhead is limited with the TTL routing hops
increases exponentially, so the TTL value can not too
much, the coverage is limited, even if a file exists, but
because it is larger than the TLL, it can not be found, This
query limitation is the nature defects of unstructured P2P
network.
B. Expanding Ring
Expanding ring [7] is actually a tentative flooding
algorithm. First launched by the query node, a smaller TTL
of flooding query, if the query is not successful, increases
will be large, literature[7] considered that when the k value
is between 16 and 64, can bring in the best results. At the
same time several "Walker" random walk requires a
termination mechanism to suspend, the literature uses TTL
= 0 and periodic testing methods to achieve query initiator,
and considered that it is appropriate that test after every four
step.
The greatest shortcomings of random walk algorithm
reflected in instability, the success rate depends on network
topology and random neighbor selection, in addition, it can
not dynamically adapt query object, query has no memory,
treat the popular and unpopular objects equally.
TTL and again to launch a query, repeat the process until
the target is found, its coverage is a gradually enlarged
circle, called "expanding ring", shown in Figure 2. In P2P
networks most of the goals have multiple copies, scope
expansion can effectively find these replicated many times
targets. When the target is to increase the proportion of
replication, the scope of the expansion mechanism can
effectively reduce the need for the TTL value, such a
mechanism solves the appropriate choice of TTL, compared
to efficient flooding algorithm, but the nature of flooding
remains unchanged, and has increasing the repeated query
messages.
Fig.2 Expanding ring routing
C. Random-Walk
Random-walk [7] is also an improvement about
flooding algorithm. It is commonly used in distributed
network routing technology, which refers to the node
receives a query message only to randomly select a
neighbor node to send the message, until the query succeed
or the hop limit TTL is used, similar to solid particles
suspended in liquid to do the "Brownian motion", compared
to random-walk algorithm the flooding method, the biggest
advantage is the network overhead increases linearly with
the number of hops, so the TTL can be set higher. The
specific steps are as follows:
Query messages randomly forwarded to a neighbor until
the target is found. Each such inquiry message is a "Walker
", as shown in Figure 3 (black node is the source node, the
arrow line is random walk routes), this mechanism can
reduce the query overhead to an order of magnitude, but the
query delay will increase an order of magnitude. To reduce
the query delay we can increase the "Walker," each request
node send several "Walker" at the same time, each "Walker"
follow their own path, they just randomly select a neighbor
node forwarding search messages, and this is the so-called
The "k-Walker" method. We can proved that usual in
unstructured network, the k-Walker method in T step, its
scope of coverage is equal to the single walker method in kT
step, therefore k-Walker method will not bring in much
network overhead, and in this case, reduces the query delay.
The selection of k value is critical, if k is too small, the
delay will be large, and when k is too large, the network
overhead
Fig. 3 Random-Walker
D. Super-node routing
In the ultra-node routing algorithm, the node is divided
into two types, one is super-node, the other the node is a
leaf. Leaf node only with the formation of super-nodes to be
connected to a network. Super nodes periodically update the
local storing data table in order to filter unnecessary traffic.
The routing algorithm is as follows: When the leaf node
launches the inquiry, if a match is found, the result is
returned; otherwise the super node forwards the query
request to the other super nodes connected until you find the
matching object. Super-node routing topology shows in
Figure 4.
Super-node routing algorithm makes nodes and nodes
are no longer entirely equal entities, super-node is
responsible for most of the routing function, the load of
common leaf node becomes very small. Its advantage lies in
the traffic load routing generated significantly reduced, the
load of ordinary leaf node reduced, so the query delay is
small.
Leaf node
Supernod
Fig.4 super-node routing node
Defects of this algorithm are obvious: It is because of the
existence of super-nodes, the network topology depends
entirely on the super node, when the super-node fails or be
attacked, the leaf nodes paralyzed.
III. CACHE-BASED SEARCH ALGORITHM
The improved flooding algorithms this article
mentioned, generally can be attributed to improving
forward mechanism to improve the resource search
efficiency and are more likely to have made the search does
not have memory. Below we will propose a cache-based
search algorithm.
A. Related Concepts
Definition 1 Index cache [8]: Store the information
(including IP address of the node, the resources, bandwidth,
nodes) to the data form of the node Ni with a certain format,
Here we call the data table as index table
.
Definition 2 Content cache [9]: Store the resources
which the neighbor not far away owned to local by the
method of duplicating.
B. Central idea
When a node's residual load capacity arrived to certain
condition, it will form a Joint topology area [10] with the
similar distance nodes and itself as the center, then the
center node and ordinary nodes store index cache to local
respectively, and copy the thermal resources to local. As
Figure 5, the solid triangles refer to center nodes which own
the strong residual load capacity; solid rounds refer to
ordinary nodes:
Fig.5 Joint topology
C. Establishment of distribution structure
(1) Select a node with a higher utility value (node utility
value will be mentioned later in this article) to monitor the
local node requests for the information and resources
situation, for the thermal resources, copy this part of
resources as cache to the node, that is, the content caches
mentioned above, and then the node as the center. Select the
number of k nodes from the peripheral to form a similar
joint topology with a center; we also called the center node
as super-node. Nodes must meet certain selection strategy
to be a central node, if in a local all the nodes does not have
such qualifications, then the local will not form a joint
topology, then the flooding method can be used as a
supplement.
2) Super-node is mainly responsible for three tasks 1)
Store contents cache for popular resources Collect the
information of ordinary nodes belong to the joint topology,
and create index cache in local, then send the index cache to
the nodes belong to the joint topology.
3) The nearby super-nodes form a network together, and
they regularly update the local index cache, as shown in
Figure 5.
(2) Determination of k value: k value depends on
network topology, the parameter depends on the central
node to maintain the index of the node adjacent to the node
farthest away from the hops hop count, in the coverage of
the hops the number of the ordinary nodes is k. If the hop
count is set too large, each node will store very large index
table, it increases the burden on each node, thus affecting
the performance of the node; but it can not be too small, too
small will reduce the success rate of searching for resources.
If you set hop count=1, then the central node only need to
take its direct neighbors nodes into the joint topology, then
based on these information to create the index cache.