11-05-2012, 01:09 PM
Decision Trees
lecture11-DecisionTrees.ppt (Size: 455 KB / Downloads: 50)
Classification Decision Trees
Nonmetric. Data is list of binary attributes.
Apple is red and round. Banana is long and yellow.
Difficult to apply nearest neighbor and other techniques.
Decision Tree: based on game of twenty questions.
Apply a series of tests to the input pattern
Each test asks a question: e.g. “is the pattern yellow?”
The answer is “yes” or “no”.
The answers give the classification -- e.g. the pattern is yellow, not-round, and long – so it is a banana.
CART: Design Decision Tree
CART is a general framework for designing decision trees.
Basic Issues::
(1). Should we restrict ourselves to binary questions?
(2) Which attributes should be tested at each node?
(3) When should a node be declared a leaf?
(4) How can we prune a large tree?
(5) How do we assign a class label to a leaf node?
When to Stop.
Generalization versus Memorization. Key issue in learning.
Don’t want a decision tree that simply memorizes the training
data. The tree should generalize, give good classification, to data
that it has not been trained on.
The decision tree gives a rule for classifying data. It has
empirical risk
Cross Validation
General Principle for testing whether you are generalizing or memorizing.
Combine with validation or cross-validation.
Validation – learn the decision tree on part of the dataset and evaluate performance on the other part.
Cross-validation – split the dataset up into subsets. Learn on each subset and evaluate on the other subsets.
E.G. learn decision trees with different impurity threshold beta.
Select the tree, and hence the beta, which has best validation.