24-01-2013, 02:56 PM
Clustering
Clustering.pptx (Size: 232.39 KB / Downloads: 14)
Distance Metric
The Euclidean distance is the most commonly used distance metric.
The weighted Euclidean is used if the parameters have not been scaled or if the parameters have significantly different levels of importance.
Use Chi-Square distance only if x.k's are close to each other. Parameters with low values of x.k get higher weights.
Clustering Techniques
Goal: Partition into groups so the members of a group are as similar as possible and different groups are as dissimilar as possible.
Statistically, the intragroup variance should be as small as possible, and inter-group variance should be as large as possible.
Total Variance = Intra-group Variance + Inter-group Variance
Cluster Interpretation
Assign all measured components to the clusters.
Clusters with very small populations and small total resource demands can be discarded.
(Don't just discard a small cluster)
Interpret clusters in functional terms, e.g., a business application, Or label clusters by their resource demands, for example, CPU-bound, I/O-bound, and so forth.
Select one or more representative components from each cluster for use as test workload.
Problems with Clustering
Goal: Minimize variance.
The results of clustering are highly variable. No rules for:
Selection of parameters
Distance measure
Scaling
Labeling each cluster by functionality is difficult.
In one study, editing programs appeared in 23 different clusters.
Requires many repetitions of the analysis.