Vptree.js

A JavaScript implementation of the Vantage-Point Tree nearest neighbor search algorithm

View project onGitHub

Vantage-point tree

A vantage-point tree is a special case of metric tree, a space-partitioning data structure used by efficient nearest neighbor search algorithms.

If you are in Mexico, looking for the nearest sushi bar in the giant World Sushi Bar Database, you can ignore those that are known to be less than 1000 km away from Kuala Lumpur. That's what the vp-tree does : group together items that are close to a particular one (here Kuala Lumpur is the vantage point), so that the whole group can easily be skipped in the search phase.

Advantages

To construct the vp-tree, a particular item (the vantage-point) is chosen in the initial dataset. The other items are then divided into two near and far subsets, which are recursively subdivided the same way.
The search algorithm exploits the triangle inequality to exclude large portions of the tree that are too far from the query object.

One of the great things about this method is that it applies to any kind of objects. Not only sushi bars but also DNA sequences, images, faces. You only need a function to measure the similarity between two objects, and this is not limited to the standard euclidean distance. Any metric, i.e. any distance function which satisfies these conditions can be used. Here are a few examples of distances.

Demos

Search cities

Search nearest cities by coordinates

Enter geographic coordinates, and find the k-nearest cities in a list of more than 50 thousands of cities, instantly.

Worldmap

Search nearest cities by hovering the globe

Enter geographic coordinates, and find the k-nearest cities in a list of more than 50 thousands of cities, instantly.

About fuzzy string matching

Some string metrics may not be very good candidates to use with vp-trees. Common ones like the Levenshtein distance or the Damerau-Levenshtein distance only take small integer values. Even if they are metrics per se and can in theory be used to build vp-trees, the resulting trees won't be very efficient. To get the most of a vp-tree you need the distances between items to spread over a wide range of different value.