30-08-2017, 03:21 PM
Grouping can be considered the most important unsupervised learning problem; thus, like any other problem of this type, it is to find a structure in a collection of unlabeled data. A loose definition of grouping could be "the process of organizing objects into groups whose members are similar in some way." A cluster is, therefore, a collection of objects that are "similar" to each other and are "dissimilar" to objects belonging to other clusters.
the purpose of clustering is to determine the intrinsic clustering in a set of unlabeled data. But how to decide what constitutes a good grouping? It can be shown that there is no "best" absolute criterion that is independent of the ultimate goal of the cluster. Consequently, it is the user who must supply this criterion, in such a way that the result of the grouping is adapted to his needs.
For example, we might be interested in finding representatives for homogeneous groups (reducing data), finding "natural clusters" and describing their unknown properties ("natural" data types), finding useful and appropriate clusters "or in the search for unusual data objects (detection of outliers).
the purpose of clustering is to determine the intrinsic clustering in a set of unlabeled data. But how to decide what constitutes a good grouping? It can be shown that there is no "best" absolute criterion that is independent of the ultimate goal of the cluster. Consequently, it is the user who must supply this criterion, in such a way that the result of the grouping is adapted to his needs.
For example, we might be interested in finding representatives for homogeneous groups (reducing data), finding "natural clusters" and describing their unknown properties ("natural" data types), finding useful and appropriate clusters "or in the search for unusual data objects (detection of outliers).