GeographicLib
1.47
|
Nearest-neighbor calculations. More...
#include <GeographicLib/NearestNeighbor.hpp>
Public Member Functions | |
NearestNeighbor () | |
NearestNeighbor (const std::vector< position > &pts, const distance &dist, int bucket=4) | |
void | Initialize (const std::vector< position > &pts, const distance &dist, int bucket=4) |
real | Search (const std::vector< position > &pts, const distance &dist, const position &query, std::vector< int > &ind, int k=1, real maxdist=std::numeric_limits< real >::max(), real mindist=-1, bool exhaustive=true, real tol=0) const |
int | NumPoints () const |
void | Save (std::ostream &os, bool bin=true) const |
void | Load (std::istream &is, bool bin=true) |
void | Statistics (int &setupcost, int &numsearches, int &searchcost, int &mincost, int &maxcost, double &mean, double &sd) const |
void | ResetStatistics () const |
Nearest-neighbor calculations.
This class implements nearest-neighbor calculations using the vantage-point tree described by
Given a set of points in some space and a distance function d satisfying the metric conditions:
\[ \begin{align} d(x,y) &\ge 0,\\ d(x,y) &= 0, \ \text{iff $x = y$},\\ d(x,y) &= d(y,x),\\ d(x,z) &\ge d(x,y) + d(y,z), \end{align} \]
the vantage-point (VP) tree provides an efficient way of determining nearest neighbors. Typically the cost of constructing a VP tree of N points is N log N, while the cost of a query is log N. Thus a VP tree should be used in situations where N is large and at least log N queries are to be made. The condition, N is large, means that \( N \gg 2^D \), where D is the dimensionality of the space.
This class is templated so that it can handle arbitrary metric spaces as follows:
real | the type used for measuring distances; it can be a real or signed integer type. |
position | the type for specifying points. |
distance | the type for a function object which takes takes two positions as arguments and returns the distance (of type real). |
The real type must support numeric_limits queries (specifically: is_signed, is_integer, max(), digits, and digits10).
Example of use:
Definition at line 109 of file NearestNeighbor.hpp.
|
inline |
Default constructor for NearestNeighbor.
This is equivalent to specifying an empty set of points.
Definition at line 123 of file NearestNeighbor.hpp.
|
inline |
Constructor for NearestNeighbor.
[in] | pts | a vector of points to include in the set. |
[in] | dist | the distance function object. |
[in] | bucket | the size of the buckets at the leaf nodes; this must lie in [0, 2 + 4*sizeof(real)/sizeof(int)] (default 4). |
GeographicErr | if the value of bucket is out of bounds or the size of pts is too big for an int. |
std::bad_alloc | if memory for the tree can't be allocated. |
The distances computed by dist must satisfy the standard metric conditions. If not, the results are undefined. Neither the data in pts nor the query points should contain NaNs or infinities because such data violates the metric conditions.
pts may contain coincident points (i.e., the distance between them vanishes); these are treated as distinct.
The choice of bucket is a tradeoff between space and efficiency. A larger bucket decreases the size of the NearestNeigbor object which scales as pts.size() / max(1, bucket) and reduces the number of distance calculations to construct the object by log2(bucket) * pts.size(). However each search then requires about bucket additional distance calculations.
Definition at line 154 of file NearestNeighbor.hpp.
References GeographicLib::NearestNeighbor< real, position, distance >::Initialize().
|
inline |
Initialize or re-initialize NearestNeighbor.
[in] | pts | a vector of points to include in the tree. |
[in] | dist | the distance function object. |
[in] | bucket | the size of the buckets at the leaf nodes; this must lie in [0, 2 + 4*sizeof(real)/sizeof(int)] (default 4). |
GeographicErr | if the value of bucket is out of bounds or the size of pts is too big for an int. |
std::bad_alloc | if memory for the tree can't be allocated. |
See also the documentation on the constructor.
If an exception is thrown, the state of the NearestNeighbor is unchanged.
Definition at line 175 of file NearestNeighbor.hpp.
Referenced by GeographicLib::NearestNeighbor< real, position, distance >::NearestNeighbor().
|
inline |
Search the NearestNeighbor.
[in] | pts | the vector of points used for initialization. |
[in] | dist | the distance function object used for initialization. |
[in] | query | the query point. |
[out] | ind | a vector of indices to the closest points found. |
[in] | k | the number of points to search for (default = 1). |
[in] | maxdist | only return points with distances of maxdist or less from query (default is the maximum real). |
[in] | mindist | only return points with distances of more than mindist from query (default = −1). |
[in] | exhaustive | whether to do an exhaustive search (default true). |
[in] | tol | the tolerance on the results (default 0). |
The indices returned in ind are sorted by distance from query (closest first).
With exhaustive = true and tol = 0 (their default values), this finds the indices of k closest neighbors to query whose distances to query are in (mindist, maxdist]. If mindist and maxdist have their default values, then these bounds have no effect. If query is one of the points in the tree, then set mindist = 0 to prevent this point (and other coincident points) from being returned.
If exhaustive = false, exit as soon as k results satisfying the distance criteria are found. If less than k results are returned then the search was exhaustive even if exhaustive = false.
If tol is positive, do an approximate search; in this case the results are to be interpreted as follows: if the k'th distance is dk, then all results with distances less than or equal dk − tol are correct; all others are suspect — there may be other closer results with distances greater or equal to dk − tol. If less than k results are found, then the search is exact.
mindist should be used to exclude a "small" neighborhood of the query point (relative to the average spacing of the data). If mindist is large, the efficiency of the search deteriorates.
Definition at line 247 of file NearestNeighbor.hpp.
|
inline |
Definition at line 343 of file NearestNeighbor.hpp.
|
inline |
Write the object to an I/O stream.
[in,out] | os | the stream to write to. |
[in] | bin | if true (the default) save in binary mode. |
std::bad_alloc | if memory for the string representation of the object can't be allocated. |
The counters tracking the statistics of searches are not saved; however the initializtion cost is saved. The format of the binary saves is not portable.
Definition at line 362 of file NearestNeighbor.hpp.
|
inline |
Read the object from an I/O stream.
[in,out] | is | the stream to read from |
[in] | bin | if true (the default) load in binary mode. |
GeographicErr | if the state read from is is illegal. |
The counters tracking the statistics of searches are reset by this operation. Binary data must have been saved on a machine with the same architecture. If an exception is thrown, the state of the NearestNeighbor is unchanged.
Definition at line 435 of file NearestNeighbor.hpp.
|
inline |
The accumulated statistics on the searches so far.
[out] | setupcost | the cost of initializing the NearestNeighbor. |
[out] | numsearches | the number of calls to Search(). |
[out] | searchcost | the total cost of the calls to Search(). |
[out] | mincost | the minimum cost of a Search(). |
[out] | maxcost | the maximum cost of a Search(). |
[out] | mean | the mean cost of a Search(). |
[out] | sd | the standard deviation in the cost of a Search(). |
Here "cost" measures the number of distance calculations needed. Note that the accumulation of statistics is not thread safe.
Definition at line 527 of file NearestNeighbor.hpp.
|
inline |
Reset the counters for the accumulated statistics on the searches so far.
Definition at line 539 of file NearestNeighbor.hpp.