Introduction of Clustering

**seminar flower** · 28-06-2012, 12:08 PM

Introduction of Clustering

.ppt

Introduction of Clustering.ppt (Size: 931.5 KB / Downloads: 169)
What is Clustering?

Attach label to each observation or data points in a set
You can say this “unsupervised classification”
Clustering is alternatively called as “grouping”
Intuitively, if you would want to assign same label to a data points that are “close” to each other
Thus, clustering algorithms rely on a distance metric between data points
Sometimes, it is said that the for clustering, the distance metric is more important than the clustering algorithm.

K-means Overview

An unsupervised clustering algorithm
“K” stands for number of clusters, it is typically a user input to the algorithm; some criteria can be used to automatically estimate K
It is an approximation to an NP-hard combinatorial optimization problem
K-means algorithm is iterative in nature
It converges, however only a local minimum is obtained
Works only for numerical data
Easy to implement

K-means: summary

Algorithmically, very simple to implement
K-means converges, but it finds a local minimum of the cost function
Works only for numerical observations
K is a user input; alternatively BIC (Bayesian information criterion) or MDL (minimum description length) can be used to estimate K
Outliers can considerable trouble to K-means

K-medoids Clustering

K-means is appropriate when we can work with Euclidean distances
Thus, K-means can work only with numerical, quantitative variable types
Euclidean distances do not work well in at least two situations
Some variables are categorical
Outliers can be potential threats
A general version of K-means algorithm called K-medoids can work with any distance measure
K-medoids clustering is computationally more intensive

Otsu’s Image Thresholding Method

Based on the clustering idea: Find the threshold that minimizes the weighted within-cluster point scatter.
This turns out to be the same as maximizing the between-class scatter.
Operates directly on the gray level histogram [e.g. 256 numbers, P(i)], so it’s fast (once the histogram is computed).

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Wireless LAN Security Introduction	study tips	1	959	20-09-2017, 12:40 PM Last Post: jaseela123
	A Technical Introduction to USB 2.0	seminar flower	1	1,657	19-09-2017, 10:42 AM Last Post: jaseela123
	INTRODUCTION TO COMPUTER NETWORKS PPT	project girl	1	2,426	19-09-2017, 09:48 AM Last Post: jaseela123
	OPERATING SYSTEM INTRODUCTION PPT	project girl	1	1,226	13-09-2017, 03:22 PM Last Post: jaseela123
	Network Simulator 2: Introduction pdf	project girl	1	1,637	09-09-2017, 01:53 PM Last Post: jaseela123
	Introduction to Nokia MORPH TECHNOLOGY pdf	study tips	1	1,089	08-09-2017, 04:42 PM Last Post: jaseela123
	Introduction to computers report	project girl	1	840	02-09-2017, 04:15 PM Last Post: jaseela123
	Clustering PPT	project girl	1	507	30-08-2017, 03:21 PM Last Post: jaseela123
	Introduction to PowerPoint 2003 pdf	project girl	0	687	25-08-2017, 09:32 PM Last Post: project girl
	Introduction to IntelÃ‚Â® NetBurstTM Micro-architecture and review of Pentium 4 micropro	Electrical Fan	0	6,584,915	25-08-2017, 09:32 PM Last Post: Electrical Fan

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.