We consider the Approximate Nearest Line Search (NLS) problem. Given a set L of N lines in the high dimensional Euclidean space R d , the goal is to build a data structure that, given a query point q ∈ R d , reports a line ∈ L such that its distance to the query is within (1 + ) factor of the distance of the closest line to the query point q. The problem is a natural generalization of the well-studied Approximate Nearest Neighbor problem for point sets (ANN), and is a natural first step towards understanding how to build efficient nearest-neighbor data structures for objects that are more complex than points.Our main result is a data structure that, for any fixed > 0, reports the approximate nearest line in time2 ) space. This is the first high-dimensional data structure for this problem with poly-logarithmic query time and polynomial space. In contrast, the best previous data structure for this problem, due to Magen [16], required quasi-polynomial space. Up to polynomials, the bounds achieved by our data structure match the performance of the best algorithm for the approximate nearest neighbor problem for point sets.
IntroductionThe Nearest Neighbor problem is a fundamental geometric problem which is of major importance in several areas such as databases, information retrieval, pattern recognition and computer vision. The problem is defined as follows: given a collection of N points, build a data structure which, given any query point q, reports the data point that is the closest to the query. A particularly interesting and well-studied instance is where the data points live in a d-dimensional space under some (e.g., Euclidean) distance function. There are several efficient algorithms known for the case when the dimension d is low (e.g., up to 10 or 20), see [18] for an overview. However, despite decades of intensive effort, the current solutions suffer from either space or query time that is exponential in d. In fact, for large enough d, in theory or in practice, they often provide little improvement over a linear time algorithm that compares a query to each point from the database. Fortunately, faster algorithms can be obtained by resorting to approximation (e.g., [6,13,12,15,10,14,9,8,17,1,2,4], * This research was supported in part by the Simons Foundation and the NSF grant CCF 1065125. see also surveys [19] and [11]). In this formulation, the algorithm is allowed to return a point whose distance from the query is at most 1 + times the distance from the query to its nearest point. The current results for ANN in Euclidean space answer queries in2 ) space [15,12]. Other algorithms, with slower query times but lower space bounds are available as well.The approximate nearest neighbor problem generalizes naturally to the case where the database objects are more complex than simple points. Perhaps the simplest generalization is where the data items are represented not by points but by lines or higher-dimensional flats (affine subspaces). Lines and low-dimensional flats are used to model data under linear ...