The problem of maximizing a concave function f(x) in the unit simplex Δ can be solved approximately by a simple greedy algorithm. For given k , the algorithm can find a point x ( k ) on a k -dimensional face of Δ, such that f ( x ( k ) ≥ f(x * ) − O (1/ k ). Here f ( x * ) is the maximum value of f in Δ, and the constant factor depends on f . This algorithm and analysis were known before, and related to problems of statistics and machine learning, such as boosting, regression, and density mixture estimation. In other work, coming from computational geometry, the existence of ϵ-coresets was shown for the minimum enclosing ball problem by means of a simple greedy algorithm. Similar greedy algorithms, which are special cases of the Frank-Wolfe algorithm, were described for other enclosure problems. Here these results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.
IntroductionThis paper gives several new demonstrations of the usefulness of random sampling techniques in computational geometry. One new algorithm creates a search structure for arrangements of hyperplanes by sampling the hyperplanes and using information from the resulting arrangement to divide and conquer. This algorithm requires randomized O(s d+`) preprocessing time to build a search structure for an arrangement of s hyperplanes in d dimensions. The structure has a query time that is worst-case O(logs). (The bound holds for any fixed ~ > 0, with the constant factors dependent on d and ~.) Using point-plane duality, the algorithm may be used for answering halfspace range queries. Another algorithm finds random samples of simplices to determine the separation distance of two polytopes. The algorithm uses randomized O(n[ d/2j) time, where n is the total number of vertices of the two polytopes. This matches previous results [DK851 for the case d : 3 and extends them. Another algorithm samples points in the plane to determine their order k Voronoi diagram, and requires randomized O(sk)o(s ~) time for s points. This sharpens the bound O(sk 2 logs) for Lee's algorithm [Lee821, and O(s 2 logs + s(s -k) log2 s) for Chazelle and Edelsbrunner's algorithm ICE851. Finally, random sampling is used to show that any set of s points in E 3 has O(sk 2 log 9 s/(log log s) 6) distinct j-sets withThis sharpens with respect to k the previous bound O(sk 5) [CP851. The proof of the bound given here is an instance of a "probabilistic method" IES741.Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.© 1986 ACM 0-89791-193-8/86/0500/0414 $00.75 The problems and resultsThe use of random sampling to divide and conquer is quite old: the partitioning step of quicksort may be viewed as an example. This paper describes several new applications of this technique.Searching arrangements. Given a set of hyperplanes S with IS I = s, their arrangement .As is the division of space into polyhedral regions that is implied by S. Such polyhedral regions are termed cells. All of the points in a cell P are on the same side of each hyperplane in S. That is, for every h E S, any two points in P are on the same side of h or on h. Using pointhyperplane duality, an algorithm for determining the cell containing a given query point immediately yields an algorithm for halfspace range queries. The algorithm given here has a much faster query time than several previously known [WiI82,YY85,CoI85], but requires more (that is, non-linear) storage. However, its preprocessing time and storage compare quite well with those of previous algorithms for range queries having O(logs) query times |DL76,CY83]. These al: gorit...
We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r , with probability at least 9/10, ∥ SAx ∥ 2 = (1 ± ε)∥ Ax ∥ 2 simultaneously for all x ∈ R d . Here, m is bounded by a polynomial in r ε − 1 , and the parameter ε ∈ (0, 1]. Such a matrix S is called a subspace embedding . Furthermore, SA can be computed in O (nnz( A )) time, where nnz( A ) is the number of nonzero entries of A . This improves over all previous subspace embeddings, for which computing SA required at least Ω( nd log d ) time. We call these S sparse embedding matrices . Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and ℓ p regression. More specifically, let b be an n × 1 vector, ε > 0 a small enough value, and integers k , p ⩾ 1. Our results include the following. — Regression: The regression problem is to find d × 1 vector x ′ for which ∥ Ax ′ − b ∥ p ⩽ (1 + ε)min x ∥ Ax − b ∥ p . For the Euclidean case p = 2, we obtain an algorithm running in O (nnz( A )) + Õ ( d 3 ε −2 ) time, and another in O (nnz( A )log(1/ε)) + Õ ( d 3 log (1/ε)) time. (Here, Õ ( f ) = f ċ log O (1) ( f ).) For p ∈ [1, ∞), more generally, we obtain an algorithm running in O (nnz( A ) log n ) + O ( r \ε −1 ) C time, for a fixed C . — Low-rank approximation: We give an algorithm to obtain a rank- k matrix  k such that ∥ A −  k ∥ F ≤ (1 + ε )∥ A − A k ∥ F , where A k is the best rank- k approximation to A . (That is, A k is the output of principal components analysis, produced by a truncated singular value decomposition, useful for latent semantic indexing and many other statistical problems.) Our algorithm runs in O (nnz( A )) + Õ ( nk 2 ε −4 + k 3 ε −5 ) time. — Leverage scores: We give an algorithm to estimate the leverage scores of A , up to a constant factor, in O (nnz( A )log n ) + Õ ( r 3 )time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.