PROBABILISTIC DISTANCE CLUSTERING ppt

**seminar flower** · 12-09-2012, 05:09 PM

PROBABILISTIC DISTANCE CLUSTERING

.pdf

1PROBABILISTIC DISTANCE.pdf (Size: 5.67 MB / Downloads: 83)

ABSTRACT

We present a new iterative method for probabilistic clustering of data. Given clusters,
their centers, and the distances of data points from these centers, the probability
of cluster membership at any point is assumed inversely proportional to the distance
from (the center of) the cluster in question. There are several ways of modeling this
assumption, and each such model (working principle) gives a di®erent algorithm.
The method is a generalization, to several centers, of the Weiszfeld method for
solving the Fermat{Weber location problem. At each iteration, the distances (Eu-
clidean, Mahalanobis, etc.) from the cluster centers are computed for all data points,
and the centers are updated as convex combinations of these points, with weights de-
termined by the working principle. Computations stop when the centers stop moving.
Progress is monitored by the joint distance function (JDF), a measure of dis-
tance from all cluster centers, that evolves during the iterations, and captures the data
in its low contours.
An important issue in clustering is the \right" number of clusters that best ¯ts a
data set. The JDF is used successfully to settle this issue, using only information freely
available by the algorithm.

Introduction

This thesis presents a new approach to clustering, called probabilistic distance cluster-
ing, its algorithms, and selected applications. The thesis is divided into ¯ve parts.

Preliminaries

The present chapter contains a description of the thesis.
Chapter 2 gives a brief survey of the clustering concepts, notation and terminology
that are relevant for this thesis. The approaches of center{based clustering and
hierarchical clustering are compared, and the main algorithms are described brie°y.

Probabilistic Distance Clustering

This part develops the models and algorithms for probabilistic clustering of data.
The main idea is presented in Chapter 3, with a new iterative method for prob-
abilistic clustering of data. The method is based on a principle, or a model of the
relationship between distances and probabilities. Given the clusters, their centers, and
the distances of data points from these centers, the probability of cluster membership
at any point is assumed inversely proportional to the distance from (the center of) the
cluster in question. The cluster centers and cluster membership probabilities of the
data points are updated using this principle. This is the basis of the probabilistic
distance clustering method described in Section 3.4.

Related Problems

This part studies two problems which are closely related to distance clustering, and can
be solved using the results of Part II with few modi¯cations.
In Chapter 6, an important application of PDQ method in estimating the parameters
of a mixture of distributions is presented. In such problems the cluster sizes are
unknown and need to be estimated. We ¯rst describe the problem of mixtures of
distributions and introduce the EM Algorithm, a well{known method for the solution
of this type of problems. The PDQ method may serve as an alternative to that method,
or as a preprocessor giving the EM Method a good start. We apply the algorithm to the
estimation of the parameters of Gaussian mixtures, and compare it to the EM method.
We conclude the chapter with the results of a number of computational experiments,
comparing the running time and solution quality of our algorithm with the EM Method.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Software Life-Cycle Models ppt	seminar flower	1	3,852	22-09-2017, 10:54 AM Last Post: jaseela123
	PPT ON LINUX	project girl	1	1,829	21-09-2017, 03:56 PM Last Post: jaseela123
	Public Key Infrastructure (Digital Certificates and Digital Signatures) PPT	project girl	1	2,364	21-09-2017, 01:18 PM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Biometric Authentication PPT	project girl	1	1,109	19-09-2017, 02:32 PM Last Post: jaseela123
	Android Interface Definition Language PPT	project girl	1	1,681	19-09-2017, 10:58 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.