We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under ÐÔ norm, based on Ô-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the о norm. It also yields the first known provably efficient approximate NN algorithm for the case Ô ½. We alsoshow that the algorithm finds the exact near neigbhor in Ç´ÐÓ Òµ time for data satisfying certain "bounded growth" condition. Unlike earlier schemes, our LSH scheme works directly on points in the Euclidean space without embeddings. Consequently, the resulting query time bound is free of large factors and is simple and easy to implement. Our experiments (on synthetic data sets) show that the our data structure is up to 40 times faster than -tree.
The goal of this article is twofold. In the first part, we survey a family of nearest neighbor algorithms that are based on the concept of localitysensitive hashing. Many of these algorithm have already been successfully applied in a variety of practical scenarios. In the second part of this article, we describe a recently discovered hashing-based algorithm, for the case where the objects are points in the d-dimensional Euclidean space. As it turns out, the performance of this algorithm is provably near-optimal in the class of the locality-sensitive hashing algorithms.
We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We consider the following basic problem: Given a stream of bits, maintain a count of the number of 1's in the last N elements seen from the stream. We show that, using O(1 log 2 N) bits of memory, we can estimate the number of 1's to within a factor of 1 +. We also give a matching lower bound of Ω(1 log 2 N) memory bits for any deterministic or randomized algorithms. We extend our scheme to maintain the sum of the last N positive integers and provide matching upper and lower bounds for this more general problem as well. We also show how to efficiently compute the Lp norms (p ∈ [1, 2]) of vectors in the sliding window model using our techniques. Using our algorithm, one can adapt many other techniques to work for the sliding window model with a multiplicative overhead of O(1 log N) in memory and a 1 + factor loss in accuracy. These include maintaining approximate histograms, hash tables, and statistics or aggregates such as sum and averages.
The goal of this article is twofold. In the first part, we survey a family of nearest neighbor algorithms that are based on the concept of localitysensitive hashing. Many of these algorithm have already been successfully applied in a variety of practical scenarios. In the second part of this article, we describe a recently discovered hashing-based algorithm, for the case where the objects are points in the d-dimensional Euclidean space. As it turns out, the performance of this algorithm is provably near-optimal in the class of the locality-sensitive hashing algorithms.
There are two main algorithmic approaches to sparse signal recovery: geometric and combinatorial. The geometric approach starts with a geometric constraint on the measurement matrix Φ and then uses linear programming to decode information about x from Φx. The combinatorial approach constructs Φ and a combinatorial decoding algorithm to match. We present a unified approach to these two classes of sparse signal recovery algorithms.The unifying elements are the adjacency matrices of high-quality unbalanced expanders. We generalize the notion of Restricted Isometry Property (RIP), crucial to compressed sensing results for signal recovery, from the Euclidean norm to the ℓp norm for p ≈ 1, and then show that unbalanced expanders are essentially equivalent to RIP-p matrices.From known deterministic constructions for such matrices, we obtain new deterministic measurement matrix constructions and algorithms for signal recovery which, compared to previous deterministic algorithms, are superior in either the number of measurements or in noise tolerance.
We consider the problem of computing the k-sparse approximation to the discrete Fourier transform of an ndimensional signal. We show:• An O(k log n)-time randomized algorithm for the case where the input signal has at most k non-zero Fourier coefficients, and• An O(k log n log(n/k))-time randomized algorithm for general input signals.Both algorithms achieve o(n log n) time, and thus improve over the Fast Fourier Transform, for any k = o(n).They are the first known algorithms that satisfy this property. Also, if one assumes that the Fast Fourier Transform is optimal, the algorithm for the exactly k-sparse case is optimal for any k = n Ω(1) . We complement our algorithmic results by showing that any algorithm for computing the sparse Fourier transform of a general signal must use at least Ω(k log(n/k)/ log log n) signal samples, even if it is allowed to perform adaptive sampling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.