We present a randomized algorithm for the approximate nearest neighbor problem in ddimensional Euclidean space. Given N points {x j } in R d , the algorithm attempts to find k nearest neighbors for each of x j , where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time requirements are proportional towith T the number of iterations performed. The memory requirements of the procedure are of the order N · (d + k). A byproduct of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {x j } for an arbitrary point x ∈ R d . The cost of each such query is proportional to, and the memory requirements for the requisite data structure are of the orderThe algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme's behavior for certain types of distributions of {x j }, and illustrate its performance via several numerical examples.
A Randomized Approximate Nearest Neighbors AlgorithmPeter W. Jones † , Andrei Osipov ‡ , Vladimir Rokhlin ⋆ Research Report YALEU/DCS/RR-1434 Yale University September 14, 2010 † This author's research was supported in part by the DMS grant #0602635 and the ONR grants #N000140910108, #N000140910340; ‡ this author's research was supported in part by the AFOSR grant #FA9550-09-1-02-41; ⋆ this author's research was supported in part by the ONR grant #N00014-10-1-0570 and the AFOSR grant #FA9550-09-1-02-41.Approved for public release: distribution is unlimited. Keywords: Approximate nearest neighbors, randomized algorithms, fast random rotations 1
Report Documentation PageForm Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.