GeographicLib  1.47
Classes | Public Member Functions | List of all members
GeographicLib::NearestNeighbor< real, position, distance > Class Template Reference

Nearest-neighbor calculations. More...

#include <GeographicLib/NearestNeighbor.hpp>

Public Member Functions

 NearestNeighbor ()
 
 NearestNeighbor (const std::vector< position > &pts, const distance &dist, int bucket=4)
 
void Initialize (const std::vector< position > &pts, const distance &dist, int bucket=4)
 
real Search (const std::vector< position > &pts, const distance &dist, const position &query, std::vector< int > &ind, int k=1, real maxdist=std::numeric_limits< real >::max(), real mindist=-1, bool exhaustive=true, real tol=0) const
 
int NumPoints () const
 
void Save (std::ostream &os, bool bin=true) const
 
void Load (std::istream &is, bool bin=true)
 
void Statistics (int &setupcost, int &numsearches, int &searchcost, int &mincost, int &maxcost, double &mean, double &sd) const
 
void ResetStatistics () const
 

Detailed Description

template<typename real, typename position, class distance>
class GeographicLib::NearestNeighbor< real, position, distance >

Nearest-neighbor calculations.

This class implements nearest-neighbor calculations using the vantage-point tree described by

Given a set of points in some space and a distance function d satisfying the metric conditions:

\[ \begin{align} d(x,y) &\ge 0,\\ d(x,y) &= 0, \ \text{iff $x = y$},\\ d(x,y) &= d(y,x),\\ d(x,z) &\ge d(x,y) + d(y,z), \end{align} \]

the vantage-point (VP) tree provides an efficient way of determining nearest neighbors. Typically the cost of constructing a VP tree of N points is N log N, while the cost of a query is log N. Thus a VP tree should be used in situations where N is large and at least log N queries are to be made. The condition, N is large, means that \( N \gg 2^D \), where D is the dimensionality of the space.

This class is templated so that it can handle arbitrary metric spaces as follows:

Template Parameters
realthe type used for measuring distances; it can be a real or signed integer type.
positionthe type for specifying points.
distancethe type for a function object which takes takes two positions as arguments and returns the distance (of type real).

The real type must support numeric_limits queries (specifically: is_signed, is_integer, max(), digits, and digits10).

Example of use:

// Example of using the GeographicLib::NearestNeighbor class. WARNING: this
// creates a file, vptree.xml or vptree.bin, in the current directory.
#include <iostream>
#include <vector>
#include <cstdlib> // For srand, rand
#include <cmath> // For asin
#include <fstream>
#include <string>
#include <sstream>
#include <algorithm> // For sort
#if !defined(GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION)
#define GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION 0
#endif
#if GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION
// If Boost serialization is available, use it.
#include <boost/archive/xml_iarchive.hpp>
#include <boost/archive/xml_oarchive.hpp>
#endif
using namespace std;
using namespace GeographicLib;
// A structure to hold a geographic coordinate. Also included is a field for a
// "name". This is unused in this example.
struct pos {
double lat, lon;
string name;
pos(double lat = 0, double lon = 0, const string& name = "")
: lat(lat), lon(lon), name(name) {}
};
pos randompos() {
double r, lat, lon;
r = 2 * (rand() + 0.5) / (RAND_MAX + 1.0) - 1;
lat = asin(r) / Math::degree();
r = 2 * (rand() + 0.5) / (RAND_MAX + 1.0) - 1;
lon = 180 * r;
return pos(lat, lon);
}
// A class to compute the distance between 2 positions.
class DistanceCalculator {
private:
Geodesic _geod;
public:
explicit DistanceCalculator(const Geodesic& geod)
: _geod(geod) {}
double operator() (const pos& a, const pos& b) const {
double s12;
_geod.Inverse(a.lat, a.lon, b.lat, b.lon, s12);
return s12;
}
};
// Pick 10000 points on the ellipsoid and determine which ones are more than
// 350 km from all the others.
// In this example the NearestNeighbor object is saved to an external file and
// read back in. This is unnecessary in this simple application, but is useful
// if many different applications need to query the same dataset.
int main() {
try {
// Define a distance function object
DistanceCalculator distance(Geodesic::WGS84());
srand(0);
vector<pos> pts;
int num = 10000;
// Sample the points
for (int i = 0; i < num; ++i) pts.push_back(randompos());
{
// Illustrate saving and restoring the GeodesicNeighbor
// construct it
GeodesicNeighbor posset(pts, distance);
// and save it
#if GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION
ofstream f("vptree.xml");
boost::archive::xml_oarchive oa(f);
oa << BOOST_SERIALIZATION_NVP(posset);
#else
ofstream ofs("vptree.bin", ios::binary);
posset.Save(ofs, true);
#endif
}
// Construct an empty GeodesicNeighbor
GeodesicNeighbor posset;
// restore it from the file
{
#if GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION
ifstream f("vptree.xml");
boost::archive::xml_iarchive ia(f);
ia >> BOOST_SERIALIZATION_NVP(posset);
#else
ifstream ifs("vptree.bin", ios::binary);
posset.Load(ifs, true);
#endif
}
// Now use it
vector<int> ind;
int cnt = 0;
double thresh = 325000;
cout << "Points more than " << thresh/1000 << "km from their neighbors\n"
<< "latitude longitude distance\n";
for (int i = 0; i < num; ++i) {
// Call search with distance limits = (0, thresh]. Set exhaustive = false
// so that the search ends as some as a neighbor is found.
posset.Search(pts, distance, pts[i], ind, 1, thresh, 0, false);
if (ind.size() == 0) {
// If no neighbors in (0, thresh], search again with no upper limit and
// with exhaustive = true (the default).
double d = posset.Search(pts, distance, pts[i], ind, 1,
numeric_limits<double>::max(), 0);
cout << pts[i].lat << " " << pts[i].lon << " " << d << "\n";
++cnt;
}
}
int setupcost, numsearches, searchcost, mincost, maxcost;
double mean, sd;
posset.Statistics(setupcost, numsearches, searchcost, mincost, maxcost,
mean, sd);
int totcost = setupcost + searchcost, exhaustivecost = num * (num - 1) / 2;
cout
<< "Number of distance calculations = " << totcost << "\n"
<< "With an exhaustive search = " << exhaustivecost << "\n"
<< "Ratio = " << double(totcost) / exhaustivecost << "\n"
<< "Efficiency improvement = "
<< 100 * (1 - double(totcost) / exhaustivecost) << "%\n";
}
catch (const exception& e) {
cerr << "Caught exception: " << e.what() << "\n";
return 1;
}
}

Definition at line 109 of file NearestNeighbor.hpp.

Constructor & Destructor Documentation

◆ NearestNeighbor() [1/2]

template<typename real , typename position , class distance >
GeographicLib::NearestNeighbor< real, position, distance >::NearestNeighbor ( )
inline

Default constructor for NearestNeighbor.

This is equivalent to specifying an empty set of points.

Definition at line 123 of file NearestNeighbor.hpp.

◆ NearestNeighbor() [2/2]

template<typename real , typename position , class distance >
GeographicLib::NearestNeighbor< real, position, distance >::NearestNeighbor ( const std::vector< position > &  pts,
const distance &  dist,
int  bucket = 4 
)
inline

Constructor for NearestNeighbor.

Parameters
[in]ptsa vector of points to include in the set.
[in]distthe distance function object.
[in]bucketthe size of the buckets at the leaf nodes; this must lie in [0, 2 + 4*sizeof(real)/sizeof(int)] (default 4).
Exceptions
GeographicErrif the value of bucket is out of bounds or the size of pts is too big for an int.
std::bad_allocif memory for the tree can't be allocated.

The distances computed by dist must satisfy the standard metric conditions. If not, the results are undefined. Neither the data in pts nor the query points should contain NaNs or infinities because such data violates the metric conditions.

pts may contain coincident points (i.e., the distance between them vanishes); these are treated as distinct.

The choice of bucket is a tradeoff between space and efficiency. A larger bucket decreases the size of the NearestNeigbor object which scales as pts.size() / max(1, bucket) and reduces the number of distance calculations to construct the object by log2(bucket) * pts.size(). However each search then requires about bucket additional distance calculations.

Warning
The same arguments pts and dist must be provided to the Search() function.

Definition at line 154 of file NearestNeighbor.hpp.

References GeographicLib::NearestNeighbor< real, position, distance >::Initialize().

Member Function Documentation

◆ Initialize()

template<typename real , typename position , class distance >
void GeographicLib::NearestNeighbor< real, position, distance >::Initialize ( const std::vector< position > &  pts,
const distance &  dist,
int  bucket = 4 
)
inline

Initialize or re-initialize NearestNeighbor.

Parameters
[in]ptsa vector of points to include in the tree.
[in]distthe distance function object.
[in]bucketthe size of the buckets at the leaf nodes; this must lie in [0, 2 + 4*sizeof(real)/sizeof(int)] (default 4).
Exceptions
GeographicErrif the value of bucket is out of bounds or the size of pts is too big for an int.
std::bad_allocif memory for the tree can't be allocated.

See also the documentation on the constructor.

If an exception is thrown, the state of the NearestNeighbor is unchanged.

Definition at line 175 of file NearestNeighbor.hpp.

Referenced by GeographicLib::NearestNeighbor< real, position, distance >::NearestNeighbor().

◆ Search()

template<typename real , typename position , class distance >
real GeographicLib::NearestNeighbor< real, position, distance >::Search ( const std::vector< position > &  pts,
const distance &  dist,
const position &  query,
std::vector< int > &  ind,
int  k = 1,
real  maxdist = std::numeric_limits<real>::max(),
real  mindist = -1,
bool  exhaustive = true,
real  tol = 0 
) const
inline

Search the NearestNeighbor.

Parameters
[in]ptsthe vector of points used for initialization.
[in]distthe distance function object used for initialization.
[in]querythe query point.
[out]inda vector of indices to the closest points found.
[in]kthe number of points to search for (default = 1).
[in]maxdistonly return points with distances of maxdist or less from query (default is the maximum real).
[in]mindistonly return points with distances of more than mindist from query (default = −1).
[in]exhaustivewhether to do an exhaustive search (default true).
[in]tolthe tolerance on the results (default 0).
Returns
the distance to the closest point found (−1 if no points are found).

The indices returned in ind are sorted by distance from query (closest first).

With exhaustive = true and tol = 0 (their default values), this finds the indices of k closest neighbors to query whose distances to query are in (mindist, maxdist]. If mindist and maxdist have their default values, then these bounds have no effect. If query is one of the points in the tree, then set mindist = 0 to prevent this point (and other coincident points) from being returned.

If exhaustive = false, exit as soon as k results satisfying the distance criteria are found. If less than k results are returned then the search was exhaustive even if exhaustive = false.

If tol is positive, do an approximate search; in this case the results are to be interpreted as follows: if the k'th distance is dk, then all results with distances less than or equal dktol are correct; all others are suspect — there may be other closer results with distances greater or equal to dktol. If less than k results are found, then the search is exact.

mindist should be used to exclude a "small" neighborhood of the query point (relative to the average spacing of the data). If mindist is large, the efficiency of the search deteriorates.

Warning
The arguments pts and dist must be identical to those used to initialize the NearestNeighbor; if not, the behavior of this function is undefined (however, if the size of pts is wrong, this function exits with no results returned).

Definition at line 247 of file NearestNeighbor.hpp.

◆ NumPoints()

template<typename real , typename position , class distance >
int GeographicLib::NearestNeighbor< real, position, distance >::NumPoints ( ) const
inline
Returns
the total number of points in the set.

Definition at line 343 of file NearestNeighbor.hpp.

◆ Save()

template<typename real , typename position , class distance >
void GeographicLib::NearestNeighbor< real, position, distance >::Save ( std::ostream &  os,
bool  bin = true 
) const
inline

Write the object to an I/O stream.

Parameters
[in,out]osthe stream to write to.
[in]binif true (the default) save in binary mode.
Exceptions
std::bad_allocif memory for the string representation of the object can't be allocated.

The counters tracking the statistics of searches are not saved; however the initializtion cost is saved. The format of the binary saves is not portable.

Note
Boost serialization can also be used to save and restore a NearestNeighbor object. This requires that the GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION macro be defined.

Definition at line 362 of file NearestNeighbor.hpp.

◆ Load()

template<typename real , typename position , class distance >
void GeographicLib::NearestNeighbor< real, position, distance >::Load ( std::istream &  is,
bool  bin = true 
)
inline

Read the object from an I/O stream.

Parameters
[in,out]isthe stream to read from
[in]binif true (the default) load in binary mode.
Exceptions
GeographicErrif the state read from is is illegal.

The counters tracking the statistics of searches are reset by this operation. Binary data must have been saved on a machine with the same architecture. If an exception is thrown, the state of the NearestNeighbor is unchanged.

Note
Boost serialization can also be used to save and restore a NearestNeighbor object. This requires that the GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION macro be defined.
Warning
The same arguments pts and dist used for initialization must be provided to the Search() function.

Definition at line 435 of file NearestNeighbor.hpp.

◆ Statistics()

template<typename real , typename position , class distance >
void GeographicLib::NearestNeighbor< real, position, distance >::Statistics ( int &  setupcost,
int &  numsearches,
int &  searchcost,
int &  mincost,
int &  maxcost,
double &  mean,
double &  sd 
) const
inline

The accumulated statistics on the searches so far.

Parameters
[out]setupcostthe cost of initializing the NearestNeighbor.
[out]numsearchesthe number of calls to Search().
[out]searchcostthe total cost of the calls to Search().
[out]mincostthe minimum cost of a Search().
[out]maxcostthe maximum cost of a Search().
[out]meanthe mean cost of a Search().
[out]sdthe standard deviation in the cost of a Search().

Here "cost" measures the number of distance calculations needed. Note that the accumulation of statistics is not thread safe.

Definition at line 527 of file NearestNeighbor.hpp.

◆ ResetStatistics()

template<typename real , typename position , class distance >
void GeographicLib::NearestNeighbor< real, position, distance >::ResetStatistics ( ) const
inline

Reset the counters for the accumulated statistics on the searches so far.

Definition at line 539 of file NearestNeighbor.hpp.


The documentation for this class was generated from the following file: