Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Benefiting from Data Mining Techniques in a Hybrid Peer-to-Peer Network
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract
The fast growing role of Peer-to-Peer networks in
various distributed services and technologies as a well
distributed, highly scalable and fault tolerant
networking infrastructure together with large number
of users inherent to these special networks has
introduced P2P systems as a potential domain for data
mining processes. Nevertheless, the distributed nature
of P2P networks is in explicit contradiction with
centralized characteristics of classical data analysis
algorithms. Although there has recently been studies
on Distributed Data Mining (DDM) as a possible
solution, proposed DDM algorithms cover a small
portion of the problem space and lack a theoretical
proof of convergence. By considering the available
potentials in well-known hybrid P2P architectures, we
proposed a layered data-gathering and computing
infrastructure on top of the hierarchical hybrid P2P
networks. These layers provide optional computing
and administrational capabilities for the entire
network, without interrupting the underlying network’s
functionalities.
1. Introduction
Started with the introduction of Napster in May
1999, the disruptive technology of Peer-to-Peer
networking has encountered an enormous growth
within the last few years. Today the traffic caused by
Peer-to-Peer networks represents a significant portion
of the Internet overall traffic [1]. For example in the
German Research Network (Deutsches Forschungsnetz
DFN) Peer-to-Peer causes up to 60 percent of the
traffic [2]. Similar trends can be observed in other
networks e.g. in the Abilene backbone [3]. At the
beginning of 2002, the traffic caused only by the
signaling traffic of Peer-to-Peer applications (no userdata-
transfers included) already amounts up to 50
percent of the total traffic volume [1].
In addition to file-sharing which was the primary
incentive for P2P networking, this well-distributed and
highly scalable infrastructure has drown a lot of
attention in many other applications. Today, there is a
wide range of endeavor to use P2P-based constructs for
grid computing and processor cycle bundling [4-6],
content-based publish-subscribe service designing [7],
multimedia streaming [8], conferencing and P2P
gaming [9], distributed information monitoring [10],
etc.
Having in mind the presence of extremely large
number of clients and much larger volume of
transactions between them, one could say that there is a
huge amount of information criss-crossing P2P-like
networks and that this information would produce
invaluable information, if they could be mined by the
very powerful data mining techniques that are
developed nowadays.
However, there is still an obstacle to overcome the
more or less distributed nature of P2P systems.
Naturally, data mining in such environments calls for
proper utilization of distributed resources in an
efficient manner to prohibit posing too much traffic or
computing load on the entire network.
Recently, there has been studies on Distributed Data
Mining (DDM) as a possible solution to this problem.
S. Datta et al. in [11] propose a novel approach to
frequent-item-set mining and k-means clustering.
However, despite the innovative methods presented,
that study covers a small portion of data mining
functionalities, and it also lacks a theoretical proof of
convergence. All in all, it seems like DDM still has a
lot to do before it could be applied to real situations.
By changing our point of view, we studied the
existing potential in Unstructured Hybrid P2P
architecture to provide a semi-central and central datagathering
infrastructure, without interrupting the
ordinary network functionalities or swamping it with
traffic/computing load. In this article, we exploit the
hierarchical structure of hybrid networks by
introducing an arbitrary number of new layers, which
we refer to as Computing layers, on top of the existing
two-layer network layout. These layers are generally
supposed to host Computing Servers that extend the
network functionalities by providing data warehousing
and data analysis services.