Distributed Matrix Completion

Teflioudi, Christina; Makari, Faraz; Gemulla, Rainer

doi:10.1109/icdm.2012.120

Cited by 77 publications

(94 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In some applications, such as the ones above, MIPS is applied to the factor matrices obtained from some matrix factorization algorithm. Fast and scalable matrix factorization algorithms have been extensively studied in the literature [Makari et al 2015;Niu et al 2011;Teflioudi et al 2012] and the factorization itself is usually not a bottleneck (see Sec. 8.1 for some examples).…”

Section: Notationmentioning

confidence: 99%

“…For Netflix, we performed a plain matrix factorization with DSGD++ using L2 regularization with regularization parameter λ = 50, as in [Teflioudi et al 2012]. For KDD, we used the factorization of Koenigstein et al [2011], 10 which incorporates the music taxonomy, temporal effects, as well as user and item biases; this dataset has been used in previous studies of the Top-k-MIPS problem.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Exact and Approximate Maximum Inner Product Search with LEMP

Teflioudi

Gemulla

2016

ACM Trans. Database Syst.

Self Cite

View full text Add to dashboard Cite

We study exact and approximate methods for maximum inner product search, a fundamental problem in a number of data mining and information retrieval tasks. We propose the LEMP framework, which supports both exact and approximate search with quality guarantees. At its heart, LEMP transforms a maximum inner product search problem over a large database of vectors into a number of smaller cosine similarity search problems. This transformation allows LEMP to prune large parts of the search space immediately and to select suitable search algorithms for each of the remaining problems individually. LEMP is able to leverage existing methods for cosine similarity search, but we also provide a number of novel search algorithms tailored to our setting. We conducted an extensive experimental study that provides insight into the performance of many state-of-the-art techniques-including LEMP-on multiple real-world datasets. We found that LEMP often was significantly faster or more accurate than alternative methods.

show abstract

Section: Notationmentioning

confidence: 99%

mentioning

confidence: 99%

Exact and Approximate Maximum Inner Product Search with LEMP

Teflioudi

Gemulla

2016

ACM Trans. Database Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…al presented a parallel matrix factorization using Stochastic Gradient Descent [73], which leverages an intelligent partitioning to avoid conflicting updates, where each iteration only works on a subset of the data and leverages an intelligent partitioning that avoids conflicting updates. While their approach converges faster than ALS, they switched their implementation to MPI [167] due to the overhead incurred by Hadoop. Recht et al proposed a biased sampling approach to avoid conflicting updates during parallel training [145] and even proofed convergence under a minor amount of update conflicts [146].…”

Section: Related Workmentioning

confidence: 99%

Scaling data mining in massively parallel dataflow systems

Schelter

2014

Proceedings of the 2014 SIGMOD PhD Symposium

View full text Add to dashboard Cite

“…Unfortunately, SGD is inherently sequential, because it updates the model parameters after each processed interaction. Techniques for parallel SGD have been proposed, yet they are either hard to implement, exhibit slow convergence or require shared-memory [11,16,20].…”

Section: Parallelizationmentioning

confidence: 99%

“…A prediction for the strength of the relation between a user and an item (e.g., the preference of a user towards a movie) is given by the dot product u ⊤ i mj of the vectors for user i and item j in the low-dimensional feature space. A popular technique to compute such a factorization is Stochastic Gradient Descent (SGD) [11,13,20], which randomly loops through all observed interactions aij, computes the error of the prediction u ⊤ i mj for each interaction and modifies the model parameters in the opposite direction of the gradient. Another technique is Alternating Least Squares (ALS) [12,23], which repeatedly keeps one of the unknown matrices (either U or M ) fixed, so that the other one can be optimally re-computed.…”

Section: Collaborative Filteringmentioning

confidence: 99%

Distributed matrix factorization with mapreduce using a series of broadcast-joins

Schelter

Boden

Schenck

et al. 2013

Proceedings of the 7th ACM Conference on Recommender Systems

View full text Add to dashboard Cite

The efficient, distributed factorization of large matrices on clusters of commodity machines is crucial to applying latent factor models in industrial-scale recommender systems. We propose an efficient, data-parallel low-rank matrix factorization with Alternating Least Squares which uses a series of broadcast-joins that can be efficiently executed with MapReduce.We empirically show that the performance of our solution is suitable for real-world use cases. We present experiments on two publicly available datasets and on a synthetic dataset termed Bigflix, generated from the Netflix dataset. Bigflix contains 25 million users and more than 5 billion ratings, mimicking data sizes recently reported as Netflix' production workload. We demonstrate that our approach is able to run an iteration of Alternating Least Squares in six minutes on this dataset. Our implementation has been contributed to the open source machine learning library Apache Mahout.

show abstract

Distributed Matrix Completion

Cited by 77 publications

References 10 publications

Exact and Approximate Maximum Inner Product Search with LEMP

Exact and Approximate Maximum Inner Product Search with LEMP

Scaling data mining in massively parallel dataflow systems

Distributed matrix factorization with mapreduce using a series of broadcast-joins

Contact Info

Product

Resources

About