29-03-2012, 04:36 PM
An Efficient Nearest Neighbor Algorithm for P2P Settings
10.1.1.79.8088.pdf (Size: 917.77 KB / Downloads: 24)
ABSTRACT
New Peer-to-Peer (P2P) applications like P2P jobemployee
seeker networks and P2P virtual cities, for
application domains such as collaborative urban planning
and forming virtual communities, are about to emerge. An
important component in these applications is spatial data,
i.e., data with locational components. Many requests
initiated on spatial data involve finding the spatial objects
that are nearest to a query location. In this paper, we
propose an efficient algorithm that finds the spatial objects
that are nearest to a given query location on a P2P network
in the order of their minimum distance to the query point.
The proposed algorithm makes use of a distributed spatial
index that does not rely on the use of a central server.
INTRODUCTION
Peer-to-peer (P2P) networks are becoming increasingly
popular as a powerful means for data exchange. They are
quite scalable and easy to deploy. Although P2P networks
attract significant interest, methods for accessing complex
data such as spatial data on P2P networks are still at their
infancy. Several future P2P applications like P2P jobemployee
seeker networks and P2P virtual cities will need
to answer queries involving spatial data, i.e., data with
locational components. Frequently more than one location
is associated with a spatial object. Hence, it commonly
differs from conventional point data in that the objects may
also have extent.
RELATED WORK
Hjaltason and Samet (2003) present a comprehensive
analysis of various similarity search algorithms in metric
spaces. Their main contribution to field is through a
priority queue based ranking algorithm that can find the
results of a ranking query in an incremental fashion.
Formally, ranking is a more general case of the NN query
where a user can ultimately retrieve all the objects of a
database in the order of their distance from a query point.
The algorithm works on many classical spatial indexing
methods including R-trees and quadtrees (for these data
structures Samet 1990 and 2005 contain very detailed
presentations). The entries of the priority queue consist of
spatial objects as well as the blocks of the space that the
underlying spatial data structure uses. The first entry in the
queue is the root of the data structure.
BACKGROUND WORK
We use a distributed quadtree index that we recently
developed to demonstrate our algorithm although other
indices can be utilized as well, e.g., emerging spatial
indices such as P2P R-trees (Mondal et al. 2004). We now
describe this indexing mechanism (Harwood and Tanin
2003; Tanin et al. 2004) that can facilitate spatial data on
P2P networks.
A P2P NEAREST NEIGHBOR ALGORITHM
Using our P2P quadtree index (Harwood and Tanin 2003;
Tanin et al. 2004), we now present our priority queue based
NN algorithm for P2P networks. We use the base concepts
from Hjaltason and Samet 2003. Yet, a direct
implementation of it has disadvantages on a P2P platform.
Primarily, we do not need to run the algorithm in a
sequential manner as we do not have to have a single thread
of control in a P2P setting. For a networked system, the
delays per contact to a peer and in a sequential manner can
add up to a significant amount.
AN APPLICATION PROTOTYPE
We have implemented a two-dimensional prototype
application to demonstrate our algorithm. The application
facilitates insertion, deletion, range (or window queries as
they are commonly called in spatial databases or in GIS),
and NN queries for spatial data. It functions as a P2P
lookup service for a virtual city. Users can find events and
places of interest on this P2P application.
CONCLUSIONS AND FUTURE WORK
Performing NN queries on spatial data is an important
operation. In many future P2P applications these queries
will require efficient algorithms to run on distributed data
structures. We have developed an efficient algorithm that
facilitates NN queries on spatial data in P2P networks. The
algorithm uses a P2P index that we developed in our earlier
work to find the spatial objects that is nearest to a query
point. Mainly by organizing a communication scheme for
the peers that can or cannot aid in executing the current NN
query, we restrict the number of messages passed between
the peers of the network while still continuing to utilize the
parallel processing power of the P2P network.