We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points fx j g in R d , the algorithm attempts to find k nearest neighbors for each of x j , where k is a user-specified integer parameter. The algorithm is iterative, and its running time requirements are proportional to T · N · ðd · ðlog dÞ þ k · ðd þ log kÞ · ðlog NÞÞþ N · k 2 · ðd þ log kÞ, with T the number of iterations performed. The memory requirements of the procedure are of the order N · ðd þ kÞ. A by-product of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among fx j g for an arbitrary point x ∈ R d . The cost of each such query is proportional to T · ðd · ðlog dÞ þ logðN∕kÞ · k · ðd þ log kÞÞ, and the memory requirements for the requisite data structure are of the order N · ðd þ kÞ þ T · ðd þ NÞ. The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme's behavior for certain types of distributions of fx j g and illustrate its performance via several numerical examples.data mining | dimensionality reduction | fast random rotations I n this paper, we describe an algorithm for finding approximate nearest neighbors (ANNs) in d-dimensional Euclidean space for each of N user-specified points fx j g. For each point x j , the scheme produces a list of k "suspects" that have high probability of being the k closest points (nearest neighbors) in the Euclidean metric. Those of the suspects that are not among the "true" nearest neighbors are close to being so.We present several measures of performance (in terms of statistics of the k chosen suspected nearest neighbors), for different types of randomly generated datasets consisting of N points in R d . Unlike other ANN algorithms that have been recently proposed (see, e.g., ref. 1), the method of this paper does not use locality-sensitive hashing. Instead we use a simple randomized divide-and-conquer approach. The basic algorithm is iterated several times and then followed by a local graph search.The performance of any fast ANN algorithm must deteriorate as the dimension d increases. Although the running time of our algorithm only grows as d · log d, the statistics of the selected approximate nearest neighbors deteriorate as the dimension d increases. We provide bounds for this deterioration (both analytically and empirically), which occurs reasonably slowly as d increases. Although the actual estimates are fairly complicated, it is reasonable to say that in 20 dimensions the scheme performs extremely well, and the performance does not seriously deteriorate until d is approximately 60. At d ¼ 100, the degradation of the statistics displayed by the algorithm is quite noticeable.An outline of our algorithm is as follows:1. Choose a random rotation, acting on R d , and rotate the N given points. 2. Take the first coordinate and divide the dataset into two boxes, where the boxes are divided by finding the median in the first coordinate...