A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning

Ben-Nun, Tal; Besta, Maciej; Huber, Simon; Ziogas, Alexandros Nikolaos; Peter, Daniel; Hoefler, Torsten

doi:10.48550/arxiv.1901.10183

Cited by 9 publications

(11 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An important direction of future work is also considering more realistic large-scale distributed workloads (e.g., using traces), such as different Remote Direct Memory Access based applications [12,15,32,56], deep learning training and inference [5,6], communication-intense linear algebra kernels [46], or irregular processing [16][17][18]58].…”

Section: Discussion and Takeawaymentioning

confidence: 99%

See 1 more Smart Citation

Towards Million-Server Network Simulations on Just a Laptop

Besta¹,

Schneider²,

Girolamo³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The growing size of data center and HPC networks pose unprecedented requirements on the scalability of simulation infrastructure. The ability to simulate such large-scale interconnects on a simple PC would facilitate research efforts. Unfortunately, as we first show in this work, existing shared-memory packet-level simulators do not scale to the sizes of the largest networks considered today. We then illustrate a feasibility analysis and a set of enhancements that enable a simple packet-level htsim simulator to scale to the unprecedented simulation sizes on a single PC. Our code is available online and can be used to design novel schemes in the coming era of omnipresent data centers and HPC clusters.

show abstract

Section: Discussion and Takeawaymentioning

confidence: 99%

“…4 Inter-router cables + server links. 5 The average over the flow size distribution, excluding retransmissions.…”

Section: Memorymentioning

confidence: 99%

Towards Million-Server Network Simulations on Just a Laptop

Besta¹,

Schneider²,

Girolamo³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The complexity and diversity of machine learning frameworks, available hardware systems, evaluation techniques, suitable metrics for quantification, and the limited availability of appropriate scientific datasets make this a challenging endeavour. Early initiatives on this front include MLPerf [59], AI Benchmark Suites from BenchCouncil [60], CORAL-2 [61], and Deep500 [62,63].…”

Section: Introductionmentioning

confidence: 99%

“…• Finally, the Deep500 effort is predominantly focused on techniques for reliably reporting performance of deep learning applications using metrics such as scalability, throughput, communication volume and time-to-solution [62,63]. This is more focused on methodology (and a corresponding framework) for quantifying and reporting deep learning performance than on any specific application.…”

Section: Introductionmentioning

confidence: 99%

Machine learning and big scientific data

Hey

Butler

Jackson

et al. 2020

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory site at Harwell near Oxford. Such 'Big Scientific Data' comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility, and the UK's Central Laser Facility. Increasingly, scientists are now needing to use advanced machine learning and other AI technologies both to automate parts of the data pipeline and also to help find new scientific discoveries in the analysis of their data. For commercially important applications, such as object recognition, natural language processing and automatic translation, deep learning has made dramatic breakthroughs. Google's DeepMind has now also used deep learning technology to develop their AlphaFold tool to make predictions for protein folding. Remarkably, they have been able to achieve some spectacular results for this specific scientific problem. Can deep learning be similarly transformative for other scientific problems?After a brief review of some initial applications of machine learning at the Rutherford Appleton Laboratory, we focus on challenges and opportunities for AI in advancing materials science.Finally, we discuss the importance of developing some realistic machine learning benchmarks using Big Scientific Data coming from a number of different scientific domains. We conclude with some initial examples of our 'SciML' benchmark suite and of the research challenges these benchmarks will enable. Rosetta protein folding program at the University of Washington in Seattle [8], commented that 'DeepMind's scientists built on two algorithm strategies pioneered by others. First, by comparing vast troves of genomic data on other proteins, AlphaFold was able to better decipher which pairs of amino acids were most likely to wind up close to one another in folded proteins. Second, related comparisons also helped them gauge the most probable distance between neighboring pairs of amino acids and the angles at which they bound to their neighbors. Both approaches do better with the more data they evaluate, which makes them more apt to benefit from machine learning computer algorithms, such as AlphaFold, that solve problems by crunching large data sets' [9]. The predictions of the AlphaFold system were remarkably good and better on average than the other 97 competitors. However, there is still hope for scientists. After the competition David Baker remarked that 'Deep Mind made much better fold level predictions than everybody, including us, using DL on co-evolution data. For problems where there are not many homologous sequences, and for protein structure refinement, I would expect their approach to work less well, as it doesn't have any physical chemistry (they used Rosetta to build their final models from predicted distances)' [10].In this paper, we make some initial explorations into the application of such Deep Learning approaches ap...

show abstract

“…Large graphs are a basis of many problems in machine learning, medicine, social network analysis, computational sciences, and others [15,25,106]. The growing graph sizes, reaching one trillion edges in 2015 (the Facebook social graph [48]) and 12 trillion edges in 2018 (the Sogou webgraph [101]), require unprecedented amounts of compute power, storage, and energy.…”

Section: Introductionmentioning

confidence: 99%

Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and Analytics

Besta,

Weber,

Gianinazzi

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

We propose Slim Graph: the first programming model and framework for practical lossy graph compression that facilitates high-performance approximate graph processing, storage, and analytics. Slim Graph enables the developer to express numerous compression schemes using small and programmable compression kernels that can access and modify local parts of input graphs. Such kernels are executed in parallel by the underlying engine, isolating developers from complexities of parallel programming. Our kernels implement novel graph compression schemes that preserve numerous graph properties, for example connected components, minimum spanning trees, or graph spectra. Finally, Slim Graph uses statistical divergences and other metrics to analyze the accuracy of lossy graph compression. We illustrate both theoretically and empirically that Slim Graph accelerates numerous graph algorithms, reduces storage used by graph datasets, and ensures high accuracy of results. Slim Graph may become the common ground for developing, executing, and analyzing emerging lossy graph compression schemes.

show abstract

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning

Cited by 9 publications

References 0 publications

Towards Million-Server Network Simulations on Just a Laptop

Towards Million-Server Network Simulations on Just a Laptop

Machine learning and big scientific data

Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and Analytics

Contact Info

Product

Resources

About