DATA MINING WITH CLUSTERING AND CLASSIFICATION ppt

**study tips** · 13-02-2013, 09:47 AM

DATA MINING WITH CLUSTERING AND CLASSIFICATION

.ppt

DATA MINING .ppt (Size: 922 KB / Downloads: 39)

Definition

Clustering can be considered the most important unsupervised learning technique; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data.
Clustering is “the process of organizing objects into groups whose members are similar in some way”.
A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

Why clustering?

A few good reasons ...
Simplifications
Pattern detection
Useful in data concept construction
Unsupervised learning process

Where to use clustering?

Data mining
Information retrieval
text mining
Web analysis
marketing
medical diagnostic

Which method should I use?

Type of attributes in data
Scalability to larger dataset
Ability to work with irregular data
Time cost
complexity
Data order dependency
Result presentation

Measuring Similarity

Dissimilarity/Similarity metric: Similarity is expressed in terms of a distance function, which is typically metric: d(i, j)
There is a separate “quality” function that measures the “goodness” of a cluster.
The definitions of distance functions are usually very different for interval-scaled, boolean, categorical, ordinal and ratio variables.
Weights should be associated with different variables based on applications and data semantics.
It is hard to define “similar enough” or “good enough”
the answer is typically highly subjective.

general steps of hierarchical clustering

Given a set of N items to be clustered, and an N*N distance (or similarity) matrix, the basic process of hierarchical clustering (defined by S.C. Johnson in 1967) is this:
Start by assigning each item to a cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters the same as the distances (similarities) between the items they contain.
Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one cluster less.
Compute distances (similarities) between the new cluster and each of the old clusters.
Repeat steps 2 and 3 until all items are clustered into K number of clusters

K-mean algorithm

It accepts the number of clusters to group data into, and the dataset to cluster as input values.
It then creates the first K initial clusters (K= number of clusters needed) from the dataset by choosing K rows of data randomly from the dataset. For Example, if there are 10,000 rows of data in the dataset and 3 clusters need to be formed, then the first K=3 initial clusters will be created by selecting 3 records randomly from the dataset as the initial clusters. Each of the 3 initial clusters formed will have just one row of data.

Classification Examples

Teachers classify students’ grades as A, B, C, D, or F.
Identify mushrooms as poisonous or edible.
Predict when a river will flood.
Identify individuals with credit risks.
Speech recognition
Pattern recognition

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Software Life-Cycle Models ppt	seminar flower	1	3,852	22-09-2017, 10:54 AM Last Post: jaseela123
	PPT ON LINUX	project girl	1	1,829	21-09-2017, 03:56 PM Last Post: jaseela123
	Public Key Infrastructure (Digital Certificates and Digital Signatures) PPT	project girl	1	2,364	21-09-2017, 01:18 PM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	Biometric Authentication PPT	project girl	1	1,109	19-09-2017, 02:32 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.