Scalable inference in latent variable models

Ahmed, Amr; Aly, Moahmed; Gonzalez, Joseph E.; Narayanamurthy, Shravan; Smola, Alexander J.

doi:10.1145/2124295.2124312

Cited by 192 publications

(226 citation statements)

References 15 publications

Supporting

Mentioning

220

Contrasting

Order By: Relevance

“…Such a setup could be based either on the mcparallel and mccollect functions in the parallel package, which unfortunately are not available on Windows, or by avoiding parallel entirely to use the Rmpi package directly (Yu 2002). Synchronization in Steps 2 and 4 is required to obtain a consistent BN and thus precludes the use of partial update techniques such as that described in Ahmed, Aly, Gonzalez, Narayanamurthy, and Smola (2012).…”

Section: Discussionmentioning

confidence: 99%

Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package

Scutari¹

2017

J. Stat. Soft.

138

117

View full text Add to dashboard Cite

It is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: Its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios.Efficient implementations of score-based structure learning benefit from past and current research in optimization theory, which can be adapted to the task by using the network score as the objective function to maximize. This is not true for approaches based on conditional independence tests, called constraint-based learning algorithms. The only optimization in widespread use, backtracking, leverages the symmetries implied by the definitions of neighborhood and Markov blanket.In this paper we illustrate how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed. As an alternative, we describe a software architecture and framework that can be used to parallelize constraint-based structure learning algorithms (also implemented in bnlearn) and we demonstrate its performance using four reference networks and two real-world data sets from genetics and systems biology. We show that on modern multi-core or multiprocessor hardware parallel implementations are preferable over backtracking, which was developed when single-processor machines were the norm.

show abstract

Section: Discussionmentioning

confidence: 99%

Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package

Scutari¹

2017

J. Stat. Soft.

138

117

View full text Add to dashboard Cite

show abstract

“…This is relevant since latent variable models and their inference algorithms store and exchange parameters that are associated with vertices rather than edges [1]. Network Topology In many graph-based applications the cost of communication (and to some extent also computation) dwarfs the cost of storing data.…”

Section: Challengesmentioning

confidence: 99%

Distributed large-scale natural graph factorization

Ahmed

Shervashidze

Narayanamurthy

et al. 2013

Proceedings of the 22nd International Conference on World Wide Web

Self Cite

542

344

View full text Add to dashboard Cite

Natural graphs, such as social networks, email graphs, or instant messaging patterns, have become pervasive through the internet. These graphs are massive, often containing hundreds of millions of nodes and billions of edges. While some theoretical models have been proposed to study such graphs, their analysis is still difficult due to the scale and nature of the data. We propose a framework for large-scale graph decomposition and inference. To resolve the scale, our framework is distributed so that the data are partitioned over a sharednothing set of machines. We propose a novel factorization technique that relies on partitioning a graph so as to minimize the number of neighboring vertices rather than edges across partitions. Our decomposition is based on a streaming algorithm. It is network-aware as it adapts to the network topology of the underlying computational hardware. We use local copies of the variables and an efficient asynchronous communication protocol to synchronize the replicated values in order to perform most of the computation without having to incur the cost of network communication. On a graph of 200 million vertices and 10 billion edges, derived from an email communication network, our algorithm retains convergence properties while allowing for almost linear scalability in the number of computers.

show abstract

“…This staleness has two sources: (i) a simple delay due to the asynchronous updates to the local models [3] because a worker computes new gradients without receiving updates from all the other workers; and (ii) a distributed aggregated delay [2] because a worker only completes a mini-batch after it has received all p gradient partitions, requiring multiple synchronisation rounds.…”

Section: Bounding Stalenessmentioning

confidence: 99%

“…A common architecture for DNN systems takes advantage of data-parallelism [3,28]: a set of worker nodes train model replicas on partitions of the input data in parallel; the model replicas are kept synchronised by a set of parameter servers-each server maintains a global partition of the trained model. Periodically workers upload their latest updates to the parameter servers, which aggregate them and return an updated global model.…”

Section: Introductionmentioning

confidence: 99%

Ako

Watcharapichat

Morales

Fernandez

et al. 2016

Proceedings of the Seventh ACM Symposium on Cloud Computing

View full text Add to dashboard Cite

Distributed systems for the training of deep neural networks (DNNs) with large amounts of data have vastly improved the accuracy of machine learning models for image and speech recognition. DNN systems scale to large cluster deployments by having worker nodes train many model replicas in parallel; to ensure model convergence, parameter servers periodically synchronise the replicas. This raises the challenge of how to split resources between workers and parameter servers so that the cluster CPU and network resources are fully utilised without introducing bottlenecks. In practice, this requires manual tuning for each model configuration or hardware type.We describe Ako, a decentralised dataflow-based DNN system without parameter servers that is designed to saturate cluster resources. All nodes execute workers that fully use the CPU resources to update model replicas. To synchronise replicas as often as possible subject to the available network bandwidth, workers exchange partitioned gradient updates directly with each other. The number of partitions is chosen so that the used network bandwidth remains constant, independently of cluster size. Since workers eventually receive all gradient partitions after several rounds, convergence is unaffected. For the ImageNet benchmark on a 64-node cluster, Ako does not require any resource allocation decisions, yet converges faster than deployments with parameter servers.

show abstract

Scalable inference in latent variable models

Cited by 192 publications

References 15 publications

Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package

Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package

Distributed large-scale natural graph factorization

Ako

Contact Info

Product

Resources

About