Subspace embeddings for the L
            <sub>1</sub>
            -norm with applications

Sohler, Christian; Woodruff, David P.

doi:10.1145/1993636.1993736

Cited by 67 publications

(126 citation statements)

References 48 publications

Supporting

Mentioning

125

Contrasting

Order By: Relevance

“…name running time s κ Φ CT [63] O(mn 2 log n) O(n log n) O(n log n) FCT [19] O(mn log n) O(n log n) O(n 4 log 4 n) SPCT [66] nnz(A) O(n 5 log 5 n) O(n 3 log 3 n) Reciprocal Exponential [64] nnz(A) O(n log n) O(n 2 log 2 n) Sampling (FCT) [19,77] O(mn log n) O(n 13/2 log 9/2 n log(1/ǫ)/ǫ 2 ) 1 + ǫ Sampling (SPCT) [66,19,77] O(nnz(A) · log n) O(n 15/2 log 11/2 n log(1/ǫ)/ǫ 2 ) 1 + ǫ Sampling (RET) [64,77] O(nnz(A) · log n) O(n 9/2 log 5/2 n log(1/ǫ)/ǫ 2 ) 1 + ǫ Table 6: Summary of data-oblivious and data-aware ℓ 1 embeddings. Above, s denotes the embedding dimension.…”

Section: Remarkmentioning

confidence: 99%

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

2016

View full text Add to dashboard Cite

In this era of large-scale data, distributed systems built on top of clusters of commodity hardware provide cheap and reliable storage and scalable processing of massive data. With cheap storage, instead of storing only currently-relevant data, it is common to store as much data as possible, hoping that its value can be extracted later. In this way, exabytes (10 18 bytes) of data are being created on a daily basis. Extracting value from these data however, requires scalable implementations of advanced analytical algorithms beyond simple data processing, e.g., statistical regression methods, linear algebra, and optimization algorithms. Many traditional methods are designed to minimize floating-point operations, which is the dominant cost of in-memory computation on a single machine. In parallel and distributed environments, however, load balancing and communication, including disk and network I/O, can easily dominate computation. These factors greatly increase the complexity of algorithm design and challenge traditional ways of thinking about the design of parallel and distributed algorithms.Here, we review recent work on developing and implementing randomized matrix algorithms in large-scale parallel and distributed environments. Randomized algorithms for matrix problems have received a great deal of attention in recent years, thus far typically either in theory or in machine learning applications or with implementations on a single machine. Our main focus is on the underlying theory and practical implementation of random projection and random sampling algorithms for very large very overdetermined (i.e., overconstrained) ℓ 1 and ℓ 2 regression problems. Randomization can be used in one of two related ways: either to construct sub-sampled problems that can be solved, exactly or approximately, with traditional numerical methods; or to construct preconditioned versions of the original full problem that are easier to solve with traditional iterative algorithms. Theoretical results demonstrate that in near input-sparsity time and with only a few passes through the data one can obtain very strong relative-error approximate solutions, with high probability. Empirical results highlight the importance of various trade-offs (e.g., between the time to construct an embedding and the conditioning quality of the embedding, between the relative importance of computation versus communication, etc.) and demonstrate that ℓ 1 and ℓ 2 regression problems can be solved to low, medium, or high precision in existing distributed systems on up to terabyte-sized data.

show abstract

Section: Remarkmentioning

confidence: 99%

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

2016

View full text Add to dashboard Cite

show abstract

“…Recently, there are a lot of progress for the ℓ p regression for the case of n ≫ d Cohen and Peng [2015], Woodruff and Zhang [2013], Meng and Mahoney [2013], Clarkson and , Clarkson et al [2016], Sohler and Woodruff [2011], Dasgupta et al [2009]. These results show various ways to find a matrix A ′ with fewer rows such that Ax p ≈ A ′ x p for all vectors x ∈ R d .…”

Section: Introductionmentioning

confidence: 99%

An homotopy method for l _p regression provably beyond self-concordance and in input-sparsity time

Bubeck

Cohen

Lee

et al. 2018

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

View full text Add to dashboard Cite

We consider the problem of linear regression where the ℓ n 2 norm loss (i.e., the usual least squares loss) is replaced by the ℓ n p norm. We show how to solve such problems incalls to a (sparse) linear system solver. This improves the state of the art for any p ∈ {1, 2, +∞}. Furthermore we also propose a randomized algorithm solving such problems in input sparsity time, i.e., O p ((Z + poly(d)) log O(1) (1/ε)) where Z is the size of the input and d is the number of variables. Such a result was only known for p = 2. Finally we prove that these results lie outside the scope of the Nesterov-Nemirovski's theory of interior point methods by showing that any symmetric self-concordant barrier on the ℓ n p unit ball has self-concordance parameter Ω(n).

show abstract

“…A first step was done by Woodruff and Sohler [93] who designed the first subspace embedding for 1 via Cauchy random variables. The method is in principle generalizable to using p-stable distributions and was improved in [30,77].…”

Section: Lemma 11 (Distributional Johnson-lindenstrauss Lemma) There mentioning

confidence: 99%

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Munteanu

Schwiegelshohn

2017

Künstl Intell

View full text Add to dashboard Cite

show abstract

Subspace embeddings for the L ₁ -norm with applications

Cited by 67 publications

References 48 publications

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

An homotopy method for l _p regression provably beyond self-concordance and in input-sparsity time

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Contact Info

Product

Resources

About

Subspace embeddings for the L 1 -norm with applications

Cited by 67 publications

References 48 publications

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

An homotopy method for l p regression provably beyond self-concordance and in input-sparsity time

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Contact Info

Product

Resources

About

Subspace embeddings for the L ₁ -norm with applications

An homotopy method for l _p regression provably beyond self-concordance and in input-sparsity time