29-11-2012, 05:57 PM
WEIGHTED K NEAREST NEIGHBOR
K_Nearest_Neighbor_Algorithm.pdf (Size: 150.23 KB / Downloads: 169)
Background
K Nearest Neighbor
Lazy Learning Algorithm
Defer the decision to generalize beyond the training
examples till a new query is encountered
Whenever we have a new point to classify, we find its K
nearest neighbors from the training data.
The distance is calculated using one of the following
measures
Euclidean Distance
Minkowski Distance
Mahalanobis Distance
Curse of Dimensionality
Distance usually relates to all the attributes and assumes all
of them have the same effects on distance
The similarity metrics do not consider the relation of
attributes which result in inaccurate distance and then impact
on classification precision. Wrong classification due to
presence of many irrelevant attributes is often termed as the
curse of dimensionality
For example: Each instance is described by 20 attributes out
of which only 2 are relevant in determining the classification
of the target function. In this case, instances that have
identical values for the 2 relevant attributes may
nevertheless be distant from one another in the 20
dimensional instance space.
Backward Elimination
For all attributes do
Delete the attribute
For each training example xi in the training data set
n Find the K nearest neighbors in the training data set based on
the Euclidean distance
Predict the class value by finding the maximum n class represented
in the K nearest neighbors
n Calculate the accuracy as
Accuracy = (# of correctly classified examples / # of training
examples) X 100
If the accuracy has decreased, restore the deleted
attribute
Attribute Weighted KNN
Read the training data from a file <x, f(x)>
Read the testing data from a file <x, f(x)>
Set K to some value
Set the learning rate α
Set the value of N for number of folds in the cross
validation
Normalize the attribute values by standard deviation
Assign random weight wi to each attribute Ai
Divide the number of training examples into N sets
DIET
DIET is an algorithm which uses a simple wrapper
approach to heuristically search through a set of
weights used for nearest neighbor classification.
DIET sometimes causes features to lose
weight, sometimes to gain weight and sometimes to
remain the same.