Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Ahn, Sungjin; Korattikara, Anoop; Liu, Nathan; Rajan, Suju; Welling, Max

doi:10.1145/2783258.2783373

Cited by 47 publications

(48 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…per iter. BMF + DSGLD Squared O((N + D)K( √ U + 1)T ) (Ahn et al, 2015) orthogonal NMF + PSGLD Squared O(DKT ) (Şimşekli et al, 2017) orthogonal Distributed BPMF --Load balance O((N + D)K(U − 1)T ) (Vander Aa et al, 2017) Proposed method --Flexible O((N + D)(K + K 2 ) √ U )…”

Section: Models Tune Learningmentioning

confidence: 99%

Distributed Bayesian matrix factorization with limited communication

et al. 2019

View full text Add to dashboard Cite

Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but it suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show improvements empirically on both real and simulated data. Our distributed approach is able to achieve a speed-up of almost an order of magnitude over the full posterior, with a negligible effect on predictive accuracy. Our method outperforms state-ofthe-art embarrassingly parallel MCMC methods in accuracy, and achieves results competitive to other available distributed and parallel implementations of BMF.

show abstract

Section: Models Tune Learningmentioning

confidence: 99%

Distributed Bayesian matrix factorization with limited communication

et al. 2019

View full text Add to dashboard Cite

show abstract

“…In this paper, we present an algorithm and software to enable parallelization of CoGAPS to enable analysis of large single cell datasets. This parallelization was done by combining existing methods for Gibbs sampling Ahn et al, 2015;Li et al) with a new infrastructure for the updating steps in CoGAPS. Prior to the implementation of an asynchronous updating scheme, CoGAPS was applied to large data sets by using a distributed version of the algorithm, GWCoGAPS, that performed analysis across random sets of genes (Stein-O'Brien et al, 2017) or random sets of cells .…”

Section: Discussionmentioning

confidence: 99%

“…However, the computational cost of implementing these approaches may be prohibitive for large single cell datasets. Many NMF methods can be run in parallel, and thereby leverage the increasing availability of suitable hardware to scale for analysis of large single cell datasets Ahn et al, 2015;Li et al).…”

Section: Introductionmentioning

confidence: 99%

CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures

Sherman

Gao

Fertig

2019

Preprint

View full text Add to dashboard Cite

Motivation: Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis. Results: We upgraded CoGAPS in Version 3 to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This software includes a new parallelization framework that is designed around the sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. Altogether, these updates to CoGAPS enhance the efficiency of the algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets. Availability: CoGAPS is available as a Bioconductor package and the source code is provided at github.com/FertigLab/CoGAPS. All efficiency updates to enable single-cell analysis available as of version 3.2.

show abstract

“…It is an optimization method that attempts to find the values of the model coefficients (the parameter or weight vector) that minimizes the loss function when they cannot be calculated analytically. SGD has proven to achieve state of-the-art performance on a variety of machine learning tasks [3,6]. With its small memory footprint, robustness against noise and fast learning rates, SGD is indeed a good candidate for training data-intensive models.…”

Section: Stochastic Gradient Descentmentioning

confidence: 99%

Guidelines for enhancing data locality in selected machine learning algorithms

Chakroun

Ashby

2019

IDA

View full text Add to dashboard Cite

To deal with the complexity of the new bigger and more complex generation of data, machine learning (ML) techniques are probably the first and foremost used. For ML algorithms to produce results in a reasonable amount of time, they need to be implemented efficiently. In this paper, we analyze one of the means to increase the performances of machine learning algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We start by motivating why and how a more efficient implementation can be achieved by exploiting reuse in the memory hierarchy of modern instruction set processors. Next we document the possibilities of such reuse in some selected machine learning algorithms.

show abstract

Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Cited by 47 publications

References 12 publications

Distributed Bayesian matrix factorization with limited communication

Distributed Bayesian matrix factorization with limited communication

CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures

Guidelines for enhancing data locality in selected machine learning algorithms

Contact Info

Product

Resources

About