26-09-2013, 04:44 PM
Constrained Clustering for Improved Pattern Discovery
Constrained Clustering.ppt (Size: 423 KB / Downloads: 16)
Prior Knowledge
We may have some idea what we want out…
Before Clustering: Prior Knowledge
After Clustering: User Feedback
Examples
Gene Clustering
Collaborative Filtering
Applications
Clustering with Existing Pairwise Prior Knowledge
Standard Clustering Tasks with User Feedback
A Spatial Interpretation
Imposing Constraints
Must-linked pairs should be close.
Cannot-linked pairs should be far.
Propagating Constraints
If A is close to B, and B is must-linked to C, then A should be close to C.
If A is close to B, and B is cannot-linked to C, then A should be far from C.
Constrained CL
Input are data and constraints.
Calculate proximities in feature space.
Must-Links
Must-linked arcs have their distances set to zero.
To propagate must-links, run all-pairs shortest-paths.
Cannot-Links
Cannot-linked arcs have max(M) added to their distances.
The complete-link algorithm implicitly propagates cannot-links.
Active Learning
Again, based on the principle that local similarities are correct.
Algorithm:
Choose some threshold distance d.
Cluster using CL until you are about to merge two clusters whose distance is greater than d.
Ask if the root nodes of the two clusters are must-linked or cannot-linked.
Impose and Propagate constraint accordingly.
Continue to cluster using CL.
Conclusions
Often in Machine Learning tasks, the feature space is locally accurate but globally inaccurate.
We can exploit this by intelligently using a small number of constraints given by prior knowledge.
Results: good!