Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Distributed Anytime Clustering using Biologically Inspired Systems
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract—In this paper, we propose a biologically-inspired
algorithm for clustering distributed data in a peer-to-peer
network with a small world topology. The method proposed
is based on a set of locally executable ?ocking algorithms
that use a decentralized approach to discover clusters by an
adaptive nearest-neighbor non-hierarchical approach and the
execution, among the peers, of an iterative self-labeling strategy
to generate global labels with which identify the clusters of all
peers. We have measured the goodness of our ?ocking search
strategy on performance in terms of accuracy and scalability.
Furthermore, we evaluated the impact of small world topology
in terms of reduction of iterations and messages exchanged to
merge clusters.
Keywords-swarm intelligence; data mining; P2P; small
world;
I. INTRODUCTION
Traditional clustering methods hardly can be applied in
the case of distributed datasets. Today’s large-scale data sets
are usually logically and physically distributed, requiring
a distributed approach to mining and the development of
approximate local algorithms [2]. This requires to develop
novel distributed clustering algorithms to handle the dif?cult
problems posed from the dynamic topology changes of the
network, impracticality of global communications and global
synchronization and the frequent failure and recovery of
resources.
Recently, many data mining algorithms based on biological
models have been developed, like for instance [7], in
order to solve the clustering problem. These paradigms are
characterized by the interaction of a large number of simple
agents that sense and change their environment locally. Ants’
colonies, ?ocks of birds, termites, swarms of bees etc. are
agent-based insect models that exhibit a collective intelligent
behavior (swarm intelligence) [1] that may be used to de?ne
new distributed clustering algorithms. In these models, intelligent
behavior frequently arises through indirect communication
between the agents using the principle of stigmergy,
taken from the insect societies. According to this principle
an agent deposits something in the environment this makes
no direct contribution to the task being undertaken but it is
used to in?uence the subsequent behavior that is task related.
In this paper, we present SPARROW-SNN a multiagent
distributed clustering algorithm implemented in a
small world P2P network which combines the stochastic
search of an adaptive ?ocking with a nearest-neighbor nonhierarchical
clustering method and an iterative self-labeling
strategy to generate global labels with which identify the
clusters of all peers.
SPARROW-SNN clusterizes data independently on different
peers by a decentralized algorithm based on ?ocking
hunting agents that execute in parallel a smart exploratory
strategy to discover local models of the clusters represented
by data points in which the cardinality of the points of the
neighborhood exceeds a ?xed threshold. Little by little that
representative points are discovered they act as attractors
for all the others. The entire ?ock then moves towards
these representative points to explore neighboring regions
that probably contain other representative points. Agents
have different features (color and speed) to adapt their
behavior according to their performance. For example, an
agent poorly-performing speed-up in order to leave an empty
or uninteresting part of the data space in order to ?nd a more
interesting area more quickly. As soon as the representative
points are discovered they are sent to the neighboring nodes
to complete the clustering by a merge procedure. Clusters
are merged using a global relaxation process in which nodes
exchange cluster labels with neighboring peers until a ?xed
point (i.e. all nodes detect no change in the labels) is reached.
SPARROW-SNN has a number of nice properties. Its
underlying topology exploits the potentialities of small world
[9] networks that have a high clustering coef?cient and a low
characteristic path length and this make the algorithm fault
tolerant and permits a faster convergence. It works in an
incremental way and this make it adapt to perform approximate
clustering [4] and its use as an anytime algorithm; it
is completely decentralized, as each peer acts independently
from each other. Furthermore, each peer communicates only
with its immediate neighbors and the communication is
asynchronous. Locality and asynchronism implies that the
algorithm is scalable to very large networks.