We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r , with probability at least 9/10, ∥ SAx ∥ 2 = (1 ± ε)∥ Ax ∥ 2 simultaneously for all x ∈ R d . Here, m is bounded by a polynomial in r ε − 1 , and the parameter ε ∈ (0, 1]. Such a matrix S is called a subspace embedding . Furthermore, SA can be computed in O (nnz( A )) time, where nnz( A ) is the number of nonzero entries of A . This improves over all previous subspace embeddings, for which computing SA required at least Ω( nd log d ) time. We call these S sparse embedding matrices . Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and ℓ p regression. More specifically, let b be an n × 1 vector, ε > 0 a small enough value, and integers k , p ⩾ 1. Our results include the following. — Regression: The regression problem is to find d × 1 vector x ′ for which ∥ Ax ′ − b ∥ p ⩽ (1 + ε)min x ∥ Ax − b ∥ p . For the Euclidean case p = 2, we obtain an algorithm running in O (nnz( A )) + Õ ( d 3 ε −2 ) time, and another in O (nnz( A )log(1/ε)) + Õ ( d 3 log (1/ε)) time. (Here, Õ ( f ) = f ċ log O (1) ( f ).) For p ∈ [1, ∞), more generally, we obtain an algorithm running in O (nnz( A ) log n ) + O ( r \ε −1 ) C time, for a fixed C . — Low-rank approximation: We give an algorithm to obtain a rank- k matrix  k such that ∥ A −  k ∥ F ≤ (1 + ε )∥ A − A k ∥ F , where A k is the best rank- k approximation to A . (That is, A k is the output of principal components analysis, produced by a truncated singular value decomposition, useful for latent semantic indexing and many other statistical problems.) Our algorithm runs in O (nnz( A )) + Õ ( nk 2 ε −4 + k 3 ε −5 ) time. — Leverage scores: We give an algorithm to estimate the leverage scores of A , up to a constant factor, in O (nnz( A )log n ) + Õ ( r 3 )time.
We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet and Martin in their seminal paper in FOCS 1983. This problem has applications to query optimization, Internet routing, network topology, and data mining. For a stream of indices in {1, . . . , n}, our algorithm computes a (1 ± ε)-approximation using an optimal O(ε −2 +log(n)) bits of space with 2/3 success probability, where 0 < ε < 1 is given. This probability can be amplified by independent repetition. Furthermore, our algorithm processes each stream update in O(1) worst-case time, and can report an estimate at any point midstream in O(1) worst-case time, thus settling both the space and time complexities simultaneously.We also give an algorithm to estimate the Hamming norm of a stream, a generalization of the number of distinct elements, which is useful in data cleaning, packet tracing, and database auditing. Our algorithm uses nearly optimal space, and has optimal O(1) update and reporting times.
We give near-optimal space bounds in the streaming model for linear algebra problems that include estimation of matrix products, linear regression, low-rank approximation, and approximation of matrix rank. In the streaming model, sketches of input matrices are maintained under updates of matrix entries; we prove results for turnstile updates, given in an arbitrary order. We give the first lower bounds known for the space needed by the sketches, for a given estimation error . We sharpen prior upper bounds, with respect to combinations of space, failure probability, and number of passes. The sketch we use for matrix A is simply S T A, where S is a sign matrix.Our results include the following upper and lower bounds on the bits of space needed for 1-pass algorithms. Here A is an n × d matrix, B is an n × d matrix, and c := d + d . These results are given for fixed failure probability; for failure probability δ > 0, the upper bounds require a factor of log(1/δ) more space. We assume the inputs have integer entries specified by O(log(nc)) bits, or O(log(nd)) bits. (Matrix Product) Output matrix C withWe show that Θ(c −2 log(nc)) space is needed. (Linear Regression) ForWe show that Θ(d 2 −1 log(nd)) space is needed.3. (Rank-k Approximation) Find matrixà k of rank no more than k, so thatwhere A k is the best rank-k approximation to A. Our lower bound is Ω(k −1 (n + d) log(nd)) space, and we give a one-pass algorithm matching this when A is given row-wise or column-wise. For general updates, we give a one-pass algorithm needing O(k −2 (n + d/ 2 ) log(nd)) space. We also give upper and lower bounds for algorithms using multiple passes, and a sketching analog of the CU R decomposition.
We give a 1-passÕ(m 1−2/k )-space algorithm for computing the k-th frequency moment of a data stream for any real k > 2. Together with the lower bounds of [1, 2, 4], this resolves the main problem left open by Alon et al in 1996 [1]. Our algorithm also works for streams with deletions and thus gives anÕ(m 1−2/p ) space algorithm for the Lp difference problem for any p > 2. This essentially matches the known Ω(m 1−2/p−o(1) ) lower bound of [12,2]. Finally the update time of our algorithm isÕ(1).
We design a new distribution over poly(rε −1 ) × n matrices S so that for any fixed n × d matrix A of rank r, with probability at least 9/10, SAx 2 = (1 ± ε) Ax 2 simultaneously for all x ∈ R d . Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A))time, where nnz(A) is the number of non-zero entries of A. This improves over all previous subspace embeddings, which required at least Ω(nd log d) time to achieve this property. We call our matrices S sparse embedding matrices.Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and p-regression:for an n × d matrix A and an n × 1 column vector b, we obtain an algorithm running in O(nnz(A)) + O(d 3 ε −2 ) time, and another in O(nnz(A) log(1/ε)) +Õ(d 3 log(1/ε)) time.(HereÕ(f ) = f · log O(1) (f ).)• to obtain a decomposition of an n × n matrix A into a product of an n × k matrix L, a k × k diagonal matrix D, and an n × k matrix W , for whichwhere A k is the best rank-k approximation, our algorithm runs in O(nnz(A)) +Õ(nk 2 ε −4 + k 3 ε −5 ) time.• to output an approximation to all leverage scores of an n × d input matrix A simultaneously, with constant relative error, our algorithms run in O(nnz(A) log n) +Õ(r 3 ) time.• to output an x for whichfor an n×d matrix A and an n×1 column vector b, we obtain an algorithm running in O(nnz(A) log n)+ poly(rε −1 ) time, for any constant 1 ≤ p < ∞.We optimize the polynomial factors in the above stated running times, and show various tradeoffs. Finally, we provide preliminary experimental results which suggest that our algorithms are of interest in practice.
We settle the 1-pass space complexity of (1 ± ε)-approximating the L p norm, for real p with 1 ≤ p ≤ 2, of a length-n vector updated in a length-m stream with updates to its coordinates. We assume the updates are integers in the range [−M, M ]. In particular, we show the space required is Θ(ε −2 log(mM ) + log log(n)) bits. Our result also holds for 0 < p < 1; although L p is not a norm in this case, it remains a well-defined function. Our upper bound improves upon previous algorithms of [Indyk, JACM '06] and [Li, SODA '08]. This improvement comes from showing an improved derandomization of the L p sketch of Indyk by using k-wise independence for small k, as opposed to using the heavy hammer of a generic pseudorandom generator against space-bounded computation such as Nisan's PRG. Our lower bound improves upon previous work of [Alon-Matias-Szegedy, JCSS '99] and [Woodruff, SODA '04], and is based on showing a direct sum property for the 1-way communication of the gap-Hamming problem.
We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the m machines receives n data points from a d-dimensional Gaussian distribution with unknown mean θ which is promised to be k-sparse. The machines communicate by message passing and aim to estimate the mean θ. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed sparse linear regression problem: to achieve the statistical minimax error, the total communication is at least Ω(min{n, d}m), where n is the number of observations that each machine receives and d is the ambient dimension. These lower results improve upon [Sha14, SD15] by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation.As our main technique, we prove a distributed data processing inequality, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.