Clustering PPT

**project girl** · 25-08-2017, 09:32 PM

Clustering

.pptx

Clustering.pptx (Size: 1.16 MB / Downloads: 22)

What is clustering?

A way of grouping together data samples that are similar in some way - according to some criteria that you pick
A form of unsupervised learning – you generally don’t have examples demonstrating how the data should be grouped together
So, it’s a method of data exploration – a way of looking for patterns or structure in the data that are of interest

Why cluster?

Cluster genes = rows
Measure expression at multiple time-points, different conditions, etc.
Similar expression patterns may suggest similar functions of genes
Cluster samples = columns
Expression levels of thousands of genes for each tumor sample
Similar expression patterns may suggest biological relationship among samples

Choosing (dis)similarity measures – a critical step in clustering

Recall that the goal is to group together “similar” data – but what does this mean?
No single answer – it depends on what we want to find or emphasize in the data; this is one reason why clustering is an “art”
The similarity measure is often more important than the clustering algorithm used – don’t overlook this choice!

(Dis)similarity measures

Instead of talking about similarity measures, we often equivalently refer to dissimilarity measures (I’ll give an example of how to convert between them in a few slides…)
Jagota defines a dissimilarity measure as a function f(x,y) such that f(x,y) > f(w,z) if and only if x is less similar to y than w is to z
This is always a pair-wise measure
Think of x, y, w, and z as gene expression profiles (rows or columns)

Missing Values

A common problem with microarray data
One approach with Euclidean distance or PLC is just to ignore missing values (i.e., pretend the data has fewer dimensions)
There are more sophisticated approaches that use information such as continuity of a time series or related genes to estimate missing values – better to use these if possible

K-means Clustering Issues

Random initialization means that you may get different clusters each time
Data points are assigned to only one cluster (hard assignment)
Implicit assumptions about the “shapes” of clusters
You have to pick the number of clusters…

**jaseela123** · 30-08-2017, 03:21 PM

Grouping can be considered the most important unsupervised learning problem; thus, like any other problem of this type, it is to find a structure in a collection of unlabeled data. A loose definition of grouping could be "the process of organizing objects into groups whose members are similar in some way." A cluster is, therefore, a collection of objects that are "similar" to each other and are "dissimilar" to objects belonging to other clusters.

the purpose of clustering is to determine the intrinsic clustering in a set of unlabeled data. But how to decide what constitutes a good grouping? It can be shown that there is no "best" absolute criterion that is independent of the ultimate goal of the cluster. Consequently, it is the user who must supply this criterion, in such a way that the result of the grouping is adapted to his needs.
For example, we might be interested in finding representatives for homogeneous groups (reducing data), finding "natural clusters" and describing their unknown properties ("natural" data types), finding useful and appropriate clusters "or in the search for unusual data objects (detection of outliers).

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Software Life-Cycle Models ppt	seminar flower	1	3,852	22-09-2017, 10:54 AM Last Post: jaseela123
	PPT ON LINUX	project girl	1	1,829	21-09-2017, 03:56 PM Last Post: jaseela123
	Public Key Infrastructure (Digital Certificates and Digital Signatures) PPT	project girl	1	2,364	21-09-2017, 01:18 PM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Biometric Authentication PPT	project girl	1	1,109	19-09-2017, 02:32 PM Last Post: jaseela123
	Android Interface Definition Language PPT	project girl	1	1,681	19-09-2017, 10:58 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.