NEAREST NEIGHBOUR MATCHING USING K D-TREES pdf

**seminar projects maker** · 28-09-2013, 01:05 PM

NEAREST NEIGHBOUR MATCHING USING K D-TREES

.pdf

NEAREST NEIGHBOUR.pdf (Size: 339.78 KB / Downloads: 28)

Abstract

Nearest neighbour matching is a problem that occurs in many fields of study. It
involves finding the point within a multi-dimensional data set that is the “closest”
to a target point given an arbitrary distance metric. This problem arises in many
situations such as pattern matching and machine learning.
Several basic techniques for nearest neighbour matching are briefly examined. A
basic discussion of the trade-offs that are made with space and time complexities is
done for each technique.
The k d-tree is discussed in greater detail. Techniques for optimizing query time
are discussed. These techniques include strategies for building the tree and informed
search algorithms.
The discussion of optimizing the tree centres around “splitting strategies,” with
region compression also briefly discussed. The splitting strategies discussed are the
simple strategy, choosing the widest dimension for splitting and choosing the most
deviant dimension for splitting.
The informed search algorithms discussed are the depth-first search, the hill-
climbing search, the best-first search and the A* search. These algorithms make
use of various heuristics which are affected by the construction of the tree.
After a rigourous empirical analysis, the depth-first search is shown to be the
most efficient searching algorithm. Depending on the distribution of the data set, the
simple splitting strategy and the most deviant dimension splitting strategy produce
desirable results.

Introduction

One of the fundamental problems in computational geometry, nearest neighbour
matching can be described as follows. Given a metric space, S, and a distance func-
tion, d(p1 , p2 ), describing the distance between two points, p1 and p2 , find the closest
point within a set of data points to any given query point. Thus, given a query point,
q ∈ S, and a set of data points, pi ∈ S, find the point where d(q, pi ) is minimal.
This problem arises in many situations including pattern matching, machine learn-
ing, information retrieval, data mining and statistics. These problems involve taking
samples and representing them as vectors or cartesian coordinates. For example, to
identify the speaker in an audio recording, one might perform spectral analysis on
the waveform and compare it to the frequency analyses of the voices of some possible
speakers.

Geometry

In a k-dimensional Euclidean space, cartesian coordinates can be represented by k-
dimensional vectors, with the ith component of the vector representing the point’s
position along the ith axis.
This can be proven to be a vector space and thus, all operations on a vector space
may be used on these cartesian coordinates. This includes the addition and scalar
multiplication, inner product and norms.

Normal Distribution

While directing an effort to survey Germany in the 1820’s, Carl Friedrich Gauss
discovered a pattern in random errors. They tend to be distributed around a mean in a
bell-shaped curve. This distribution is appropriately named the Gaussian distribution
and is more commonly referred to as the normal distribution.

Nearest Neighbour Matching

Given a set of points, A, and a target, target, one is faced with the problem of finding
the closest point to target in A using a given distance metric. The most intuitive
solution to this problem, demonstrated in algorithm 3.1, is to check each element of
A in turn comparing it to a best known match.

Definition

Bentley [Ben75] proposed a solution to the nearest neighbour matching problem with
linear space complexity and sub-linear time complexity called the multidimensional
binary search tree, or “k d-tree” for short.
A k d-tree is incredibly similar to the unidimensional binary search tree and shares
some of the same algorithms. At every level, a binary search tree partitions a set of
points around the point stored within the node. Similarly, the k d-tree partitions a
set of vectors along a k-dimensional hyper-plane, orthogonal to an axis and passing
through the vector stored within the node. The dimension represented by the axis is
known as the splitting dimension.

Constructing Sub-Optimal
k d-Trees

To generate a k d-tree that will produce optimal results for all queries is an NP
complete optimization problem [FBF77]. Instead, a sub-optimal tree can be created
in polynomial time that attempts to produce a “good” tree. It is assumed that a
“good” tree is balanced (has a constant height) and has smaller hyper-rectangles
with sides of approximately even lengths [Ben75].
To accomplish this using the method in this paper, all points in the tree must be
known ahead of time. It is possible that a k d-tree could be dynamically balanced in
a manner similar to the AVL or red-black trees, but this problem is outside the scope
of this paper.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	HOW EMAIL WORKS pdf	project girl	1	3,067	20-09-2017, 11:39 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	Review: Context Aware Tools for Smart Home Development pdf	study tips	1	1,227	20-09-2017, 11:22 AM Last Post: jaseela123
	B-Trees report	project girl	1	913	19-09-2017, 01:21 PM Last Post: jaseela123
	Getting Started with the MAXQ1103 Evaluation Kit and the CrossWorks Compiler pdf	project girl	1	969	15-09-2017, 03:11 PM Last Post: jaseela123
	Wireless Application Protocol (WAP) pdf	project girl	1	1,531	15-09-2017, 02:42 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	Wireless Automotive Communications pdf	seminar projects maker	1	637	14-09-2017, 01:27 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.