0% found this document useful (0 votes)
21 views20 pages

KD Tree Nearest Neighbor Search in Java

I. The document discusses four different algorithms for nearest neighbor search: BallTree, Binary Tree, KDTree, and LinearNNSearch. II. It provides details on the implementation and behavior of BallTrees, Binary Trees, KDTrees, and linear search. III. KDTree range searches and nearest neighbor searches are more efficient than exhaustive linear searches as KDTrees partition the space to prune away large portions where no match could exist.

Uploaded by

cs0814
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views20 pages

KD Tree Nearest Neighbor Search in Java

I. The document discusses four different algorithms for nearest neighbor search: BallTree, Binary Tree, KDTree, and LinearNNSearch. II. It provides details on the implementation and behavior of BallTrees, Binary Trees, KDTrees, and linear search. III. KDTree range searches and nearest neighbor searches are more efficient than exhaustive linear searches as KDTrees partition the space to prune away large portions where no match could exist.

Uploaded by

cs0814
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Algorithms going to analysis

I. BallTree,
II. Binary Tree,
III. KDTree,
IV. LinearNNSearch

Ball Tree

implementing the BallTree/Metric Tree algorithm for nearest neighbour search.

The connection to dataset is only a reference. For the tree structure the indexes are stored in an
array.

See the implementing classes of different construction methods of the trees for details on its
construction.

Binary Tree

In Java, the key points in the recursion are exactly the same as in C or C++. In fact, I created the
Java solutions by just copying the C solutions, and then making the syntactic changes. The
recursion is the same, however the outer structure is slightly different.

In Java, we will have a BinaryTree object that contains a single root pointer. The root pointer
points to an internal Node class that behaves just like the node struct in the C/C++ version. The
Node class is private -- it is used only for internal storage inside the BinaryTree and is not
exposed to clients. With this OOP structure, almost every operation has two methods: a one-line
method on the BinaryTree that starts the computation, and a recursive method that works on the
Node objects. For the lookup() operation, there is a [Link]() method that the client
uses to start a lookup operation. Internal to the BinaryTree class, there is a private recursive
lookup(Node) method that implements the recursion down the Node structure. This second,
private recursive method is basically the same as the recursive C/C++ functions above -- it takes
a Node argument and uses recursion to iterate over the pointer structure.

KD Tree

k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points
in a k-dimensional space. k-d trees are a useful data structure for several applications, such as
searches involving a multidimensional search key (e.g. range searches and nearest neighbor
searches). k-d trees are a special case of binary space partitioning trees.
Linear search

Linear search is one of the basic search techniques that we've now. Although this is not a very
good search technique, one should understand this concept. Let's consider our aim to search for a
key element in an array of elements. We loop through all the array elements and check for
existence of the key element. Since we go element by element, this search is called as Linear
search or sequential search. Search element is called as key element.

Linear search algorithm

BEGIN

DECLARE key, array, i, found

ASSIGN values to array/ACCEPT array values

PRINT "Please enter key element:"

ACCEPT key

ASSIGN i with 1

FOR EACH i in 1 to [Link]

LOOP

IF array[i] = key

THEN

ASSIGN found with true

END IF

END LOOP

IF found = true

THEN

PRINT "Key found"

ELSE

PRINT "Key not found"

END IF

END
Implementation of Kd-tree

1. Differences between KD tree search and exhaustive search

import [Link];

import [Link];

class kdtime {

public static void main(String [] args) {

if ([Link] < 3) {

[Link]("Usage: java kdtime <# points> <# dims> ");

[Link]("<# trials> [seed]");

[Link](1);

int n = [Link](args[0]);

int k = [Link](args[1]);

int t = [Link](args[2]);

// generate N random K-dimensional points in (0,1)

Random r = [Link] > 3 ? // support random seed as

new Random([Link](args[3])) : // optional fourth arg

new Random();
double [][] x = new double[n][k];

for (int i=0; i<n; ++i) {

for (int j=0; j<k; ++j) {

x[i][j] = [Link]();

// build KD-tree with indices as values

KDTree<Integer> kd = new KDTree<Integer>(k);

try {

for (int i=0; i<n; ++i) {

[Link](x[i], i);

catch (Exception e) {

[Link](e);

[Link](0);

// compare search

linear_search(x, t, r);

kd_search(x, t, kd, r);

// compare nearest-neighbor

linear_nearest(x, t, r);
kd_nearest(x, t, kd, r);

// do linear search

static void linear_search(double [][] x, int t, Random r) {

long before = getTimeMillis();

for (int i=0; i<t; ++i) {

// pick a random point

double [] targ = x[(int)([Link]*[Link]())];

search(x, targ);

long millis = getTimeMillis() - before;

[Link](t + " linear searches took " + millis + " msec.");

// do KD-tree search

static void kd_search(double [][] x, int t, KDTree kd, Random r) {

long before = getTimeMillis();

try {

for (int i=0; i<t; ++i) {


// pick a random point

double [] targ = x[(int)([Link]*[Link]())];

[Link](targ);

catch (Exception e) {

[Link](e);

[Link](0);

long millis = getTimeMillis() - before;

[Link](t + " KD-tree searches took " + millis + " msec.");

// do linear nearest neighbor

static void linear_nearest(double [][] x, int t, Random r) {

int k = x[0].length;

long before = getTimeMillis();

for (int i=0; i<t; ++i) {

double [] targ = random_point(r, k);


int n = neighbor(x, targ);

long millis = getTimeMillis() - before;

[Link](t + " linear nearest took " + millis + " msec.");

// do KD-tree nearest neighbor

static void kd_nearest(double [][] x, int t, KDTree kd, Random r) {

int k = x[0].length;

long before = getTimeMillis();

try {

for (int i=0; i<t; ++i) {

double [] targ = random_point(r, k);

Integer nbr = (Integer)[Link](targ);

int n = [Link]();

catch (Exception e) {

[Link](e);

[Link](0);
}

long millis = getTimeMillis() - before;

[Link](t + " KD-tree nearest took " + millis + " msec.");

// linear search for exact match

private static int search(double [][] x, double [] targ) {

for (int i=0; i<[Link]; ++i) {

if (equal(x[i], targ)) return i;

return -1;

// linear search for index of neighbor

private static int neighbor(double [][] x, double [] targ) {

double mindst = Double.POSITIVE_INFINITY;

int minidx = -1;

for (int i=0; i<[Link]; ++i) {

double d = sqrdst(x[i], targ);

if (d < mindst) {

mindst = d;

minidx = i;
}

return minidx;

// square of Euclidean distance between points

private static double sqrdst(double [] p, double [] q) {

double dst = 0;

for (int i=0; i<[Link]; ++i) {

double dif = p[i] - q[i];

dst += dif*dif;

return dst;

// point equality test

private static boolean equal(double [] p, double [] q) {

for (int i=0; i<[Link]; ++i) {

if (p[i] != q[i]) return false;

return true;

private static double [] random_point(Random r, int k) {


double [] x = new double[k];

for (int i=0; i<k; ++i) {

x[i] = [Link]();

return x;

private static long getTimeMillis() {

Date d = new Date();

return [Link]();

2. n-nearest-neighbors method of KDTree class. Creates a KDTree of M keys, and


finds N neighbors of D- dimensional point in center of space (all coords = 0.5), with
N, D, M.

import [Link];

import [Link];

class kdnbrs {

public static void main(String [] args) {

[Link] r = new [Link](0);


if ([Link] < 3) {

[Link]("Usage: java kdnbrs <# points> <# dims>");

[Link](" <# nbrs>");

[Link](1);

int m = [Link](args[0]);

int d = [Link](args[1]);

int n = [Link](args[2]);

double [][] keys = new double [m][d];

double [] targ = new double [d];

for (int k=0; k<d; ++k) {

targ[k] = 0.5;

// make a D-dimensional KD-tree

KDTree<Integer> kd = new KDTree<Integer>(d);

try {

// add M randomly keyed nodes

for (int i=0; i<m; ++i) {


for (int j=0; j<d; ++j) {

keys[i][j] = [Link]();

[Link](keys[i], i);

// get N nearest neighbors and show their keys

long start = time();

List<Integer> nbrs = [Link](targ, n);

[Link]((time() - start) + " msec");

//[Link](0);

for (int j : nbrs) {

for (int k=0; k<d; ++k) {

[Link](keys[j][k] + " ");

[Link]();

catch (Exception e) {

[Link](e);
}

private static long time() {

[Link] cal = new [Link]();

return [Link]();

3. Behavior of KDTree range search

import [Link];

import [Link];

import [Link];

class kdrange {

public static void main(String [] args) {

// check arguments

if ([Link] < 3) {

[Link]("Usage: java kdrange <gridsize> <xradius> " +

"<yradius>");
[Link](1);

int gsize = [Link](args[0]);

int xrad = [Link](args[1]);

int yrad = [Link](args[2]);

// make a KD-tree

KDTree<Integer> kd = new KDTree<Integer>(2);

// plot grid and add nodes

for (int i=0; i<gsize; ++i) {

for (int j=0; j<gsize; ++j) {

int n = i * gsize + j + 1;

if (i == gsize/2 && j==gsize/2) {

[Link]("*\t");

else {

[Link](n + "\t");

double [] key = {i, j};

try {

[Link](key, n);

catch (Exception e) {
[Link](e);

[Link]();

try {

// get objects in range of center

double [] lo = {gsize/2-xrad,gsize/2-yrad};

double [] hi = {gsize/2+xrad,gsize/2+yrad};

List<Integer> o = [Link](lo, hi);

// dump them to stdout

for (int i : o) {

[Link](i);

catch (Exception e) {

[Link](e);

class kddemo {

public static void main(String [] args) {


double [] A = {2, 5};

double [] B = {1, 1};

double [] C = {3, 9};

double [] T = {1, 10};

// make a KD-tree and add some nodes

KDTree<String> kd = new KDTree<String>(2);

try {

[Link](A, new String("nnoad A"));

[Link](B, new String("node B"));

[Link](C, new String("node n"));

catch (Exception e) {

[Link](e);

// look for node B

try {

String n = [Link](B);

[Link](n);

catch (Exception e) {

[Link](e);
}

try {

// find T's nearest neighbor, which should be C

String n = [Link](T);

[Link](n);

// remove C from the tree

[Link](C);

// now T's nearest neighbor should be A

n = [Link](T);

[Link](n);

catch (Exception e) {

[Link](e);

}
4. Behavior of KDTree class

class kddemo {

public static void main(String [] args) {

double [] A = {2, 5};

double [] B = {1, 1};

double [] C = {3, 9};

double [] T = {1, 10};

// make a KD-tree and add some nodes

KDTree<String> kd = new KDTree<String>(2);

try {

[Link](A, new String("nnoad A"));

[Link](B, new String("node B"));

[Link](C, new String("node n"));

catch (Exception e) {

[Link](e);

// look for node B

try {

String n = [Link](B);
[Link](n);

catch (Exception e) {

[Link](e);

try {

// find T's nearest neighbor, which should be C

String n = [Link](T);

[Link](n);

// remove C from the tree

[Link](C);

// now T's nearest neighbor should be A

n = [Link](T);

[Link](n);

catch (Exception e) {

[Link](e);

Common questions

Powered by AI

KD-tree range searches function by recursively traversing the KD-tree to find all points within a given range defined by lower and upper bounds in each dimension. In contrast, nearest neighbor searches aim to find the closest point(s) to a queried point by recursively narrowing down the search space and backtracking when potentially closer points are identified in other branches . Range searches are useful for applications needing bulk data retrieved within specified bounds, whereas nearest neighbor searches are critical for tasks requiring closest proximity, such as clustering or classification tasks . The main differences in use cases revolve around the type of proximity query required by the application.

Linear Search is considered practical in scenarios where simplicity and minimal setup time are priorities, such as when the dataset is small or retrieval speed is not critical. It avoids the overhead of constructing and maintaining complex data structures like KD-trees, thus proving useful for small-scale data or quick, ad-hoc queries . Linear search can also be advantageous in environments where possible data modifications happen frequently since it does not require restructuring as trees might . Additionally, it is immune to the dimensionality issues that might affect KD-trees in high-dimensional data scenarios.

The advantages of using a KD-tree over a linear search for high-dimensional data nearest neighbor searches include significantly improved efficiency and performance for such queries. KD-trees allow for partitioning space which can dramatically reduce the number of comparisons needed to find the nearest neighbor, especially in lower-dimensional spaces . However, the drawbacks include its reduced effectiveness in very high-dimensional spaces where the cost of comparisons becomes similar to linear search methods because of the curse of dimensionality, making the algorithm less efficient as dimensions increase .

Recursive implementations in tree structures, such as Binary or KD-trees, are straightforward to implement because they naturally reflect the tree's self-similar hierarchical nature. However, they may lead to stack overflow errors if the tree height is substantial due to deep recursive calls . Iterative implementations, using structures like stacks or queues, avoid excessive memory usage for deep-tree traversal, offering better control over each operation's execution context. Iterative approaches are preferred in scenarios requiring high-performance systems with limited memory or when handling trees of considerable depth, where recursion's limits might be problematic .

Implementing KD-tree searches benefits from dimensionality reduction techniques by alleviating the curse of dimensionality, which affects the efficiency of searches in high-dimensional spaces. Dimensionality reduction reduces the problem space, allowing the KD-tree to partition space more effectively, thereby improving search times. It helps in maintaining the KD-tree's balance and performance by reducing overhead and promoting more significant partitioning benefits . Techniques like Principal Component Analysis (PCA) or t-SNE can be utilized before constructing a KD-tree to transform and reduce the dimensions without significant loss of information.

As dimensions increase, the performance advantage of KD-tree over linear search for nearest-neighbor queries typically diminishes. While KD-trees are designed to optimize nearest-neighbor searches by efficiently partitioning space, in very high-dimensional spaces, this partitioning becomes less effective due to the curse of dimensionality . As a result, KD-tree searches may approach linear search in terms of complexity, where no significant spatial partitioning is achieved, and many parts of the tree must still be explored. Hence, the benefit decreases as dimensionality increases, and both may require exhaustive search strategies in extreme cases.

Constructing a KD-tree from a set of k-dimensional points involves several essential components and steps: 1) Choosing a dimension to split on, often alternating among dimensions at each level or choosing the dimension with the widest spread; 2) Sorting the points along the chosen dimension and selecting a median point for balanced partitioning; 3) Recursively constructing the tree by assigning left and right child nodes based on whether points fall on one side or another of the median, thus creating the nodes for each subtree . This recursive splitting continues until a leaf node condition is met, optimizing the structure for nearest-neighbor searches and range queries.

In binary tree implementations, cursors (often implemented as pointers or references) play a crucial role in navigating and manipulating the tree structure. They allow traversal of the tree nodes and facilitate operations such as insertion, deletion, and search . One advantage of using cursors is their ability to provide constant-time access to a node's children, which is essential for recursive algorithms managing tree balance or searching tasks. Additionally, cursors enable efficient recyclability of the tree nodes and help maintain the structure's integrity by ensuring correct parent-child relationships during dynamic updates.

In Java, the recursive structure of a Binary Tree involves an object-oriented approach where a BinaryTree object contains a root pointer pointing to an internal Node class. This Node class encapsulates the recursive logic for tree traversal or search operations, hiding it from the client . In contrast, C/C++ typically uses direct recursion with pointers without encapsulating it in objects, exposing and operating directly on node structures . The syntactical changes and encapsulation in Java facilitate better data encapsulation and modularity.

Exceptions can affect the performance of KD-tree operations by interrupting normal control flow, possibly causing repeated failed operations and additional overhead for exception handling code. This adverse effect can lead to degraded performance, especially in cases where exceptions arise frequently, such as invalid key handling during insertions or searches . Strategies to mitigate these effects include efficient exception management through catching anticipated exceptions early, validating inputs before operations, and implementing robust error-handling logic to correct erroneous states or retry the operation safely. Pre-validating dimensionality and boundary conditions before execution can also minimize runtime interruptions.

You might also like