Ray: A Distributed Framework for Emerging AI Applications

Moritz, Philipp; Nishihara, Robert; Wang, Stephanie; Tumanov, Alexey; Liaw, Richard; Liang, Eric; Elibol, Melih; Yang, Zongheng; Paul, William E.; Jordan, Michael I.; Stoica, Ion

doi:10.48550/arxiv.1712.05889

Cited by 54 publications

(58 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Line counts include lines used for logging and debugging functionality. We implemented Tune using the Ray (Moritz et al (2017)) framework, which as noted earlier provides the actor abstraction used to run trials in Tune. In contrast to popular distributed frameworks such as Spark (Zaharia et al (2012)), or MPI (Gabriel et al (2004)), Ray offers a more flexible programming model.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Tune: A Research Platform for Distributed Model Selection and Training

Liaw,

Liang,

Nishihara

et al. 2018

Preprint

Self Cite

240

243

View full text Add to dashboard Cite

Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many hyperparameter search algorithms have been proposed for improving the efficiency of model selection, however their adaptation to the distributed compute environment is often ad-hoc. We propose Tune, a unified framework for model selection and training that provides a narrow-waist interface between training scripts and search algorithms. We show that this interface meets the requirements for a broad range of hyperparameter search algorithms, allows straightforward scaling of search to large clusters, and simplifies algorithm implementation. We demonstrate the implementation of several state-of-the-art hyperparameter search algorithms in Tune. Tune is available at http://ray.readthedocs.io/en/latest/tune.html.

show abstract

Section: Methodsmentioning

confidence: 99%

“…To meet these requirements, we propose the Tune user-facing and scheduling APIs (Section 4) and implement it on the Ray distributed computing framework (Moritz et al (2017)). The Ray framework provides the underlying distributed execution and resource management.…”

Section: Requirements For Api Generalitymentioning

confidence: 99%

Tune: A Research Platform for Distributed Model Selection and Training

Liaw,

Liang,

Nishihara

et al. 2018

Preprint

Self Cite

240

243

View full text Add to dashboard Cite

show abstract

“…To handle task distribution, pyscreener relies on the ray library [13] for distributed computation. For multithreaded docking software, pyscreener allows a user to specify how many CPU cores to run each individual docking simulation over, running as many docking simulations in parallel as possible for a given number of total CPU cores in the ray cluster.…”

Section: Implementation and Performancementioning

confidence: 99%

pyscreener: A Python Wrapper for Computational Docking Software

Graff,

Coley

2021

Preprint

View full text Add to dashboard Cite

pyscreener is a Python library that seeks to alleviate the challenges of large-scale structure-based design using computational docking. It provides a simple and uniform interface that is agnostic to the backend docking engine with which to calculate the docking score of a given molecule in a specified active site. Additionally, pyscreener features first-class support for task distribution, allowing users to seamlessly scale their code from a local, multi-core setup to a large, heterogeneous resource allocation.

show abstract

“…Auto-scaling and Fault Tolerance Efforts that add fault tolerance to ScaLAPACK has so far demonstrated to incur significant performance overhead [11]. For almost all BSP and dataflow systems [30,24,29], recomputation is required to restore stateful workers or datasets that have not been checkpointed. MadLINQ [34] also uses dependency tracking to minimize recomputation for its pipelined execution.…”

Section: Related Workmentioning

confidence: 99%

numpywren: serverless linear algebra

Shankar,

Krauth,

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches either require access to supercomputing clusters, or impose configuration and cluster management challenges. In this paper we show how the disaggregation of storage and compute resources in so-called "serverless" environments, combined with compute-intensive workload characteristics, can be exploited to achieve elastic scalability and ease of management.We present numpywren, a system for linear algebra built on a serverless architecture. We also introduce LAmbdaPACK, a domain-specific language designed to implement highly parallel linear algebra algorithms in a serverless setting. We show that, for certain linear algebra algorithms such as matrix multiply, singular value decomposition, and Cholesky decomposition, numpywren's performance (completion time) is within 33% of ScaLAPACK, and its compute efficiency (total CPU-hours) is up to 240% better due to elasticity, while providing an easier to use interface and better fault tolerance. At the same time, we show that the inability of serverless runtimes to exploit locality across the cores in a machine fundamentally limits their network efficiency, which limits performance on other algorithms such as QR factorization. This highlights how cloud providers could better support these types of computations through small changes in their infrastructure.

show abstract

Ray: A Distributed Framework for Emerging AI Applications

Cited by 54 publications

References 23 publications

Tune: A Research Platform for Distributed Model Selection and Training

Tune: A Research Platform for Distributed Model Selection and Training

pyscreener: A Python Wrapper for Computational Docking Software

numpywren: serverless linear algebra

Contact Info

Product

Resources

About