2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) 2021
DOI: 10.1109/mlhpc54614.2021.00009
|View full text |Cite
|
Sign up to set email alerts
|

MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 15 publications
0
7
0
Order By: Relevance
“…Data parallelism techniques are widely utilized for training on large datasets [9,13,21,26]. When applying data parallelism on HPC clusters, the training typically includes three stages: (1) I/O: loading the data from a remote parallel file system (i.e., GPFS, Lustre) to host memory; (2) Computation: performing forward and backward phases to calculate the local gradient on each device; (3) Communication: synchronizing averaged gradients across multiple devices to update model weights.…”
Section: Distributed Training With Data Parallelismmentioning
confidence: 99%
See 2 more Smart Citations
“…Data parallelism techniques are widely utilized for training on large datasets [9,13,21,26]. When applying data parallelism on HPC clusters, the training typically includes three stages: (1) I/O: loading the data from a remote parallel file system (i.e., GPFS, Lustre) to host memory; (2) Computation: performing forward and backward phases to calculate the local gradient on each device; (3) Communication: synchronizing averaged gradients across multiple devices to update model weights.…”
Section: Distributed Training With Data Parallelismmentioning
confidence: 99%
“…We conduct the benchmarking on ThetaGPU [2] (each node has eight A100 GPUs) at the Argonne Leadership Computing Facility, following the guidelines from MLPerf HPC [13].…”
Section: Research Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…General ML Workload Benchmarks. MLPerf Inference (Reddi et al, 2020) is a set of industry standard, singlekernel ML benchmarks that span the ML landscape, from high performance computers (Farrell et al, 2021) to tiny embedded systems (Banbury et al, 2021). It also provides a rich set of inference scenarios based on realistic use cases from industry: single-stream (single inference), multistream (repeated inference with a time interval), server (random inference request modeled via Poisson distribution), and offline (batch processing).…”
Section: Detailed Related Work Comparisonmentioning
confidence: 99%
“…ML and XR Benchmarks. Compared to the abovementioned benchmarks (Dee, 2016;EEM, 2020;VRM, 2020;Banbury et al, 2021;Farrell et al, 2021;Gao et al, 2019;Huzaifa et al, 2021;Ignatov et al, 2018;Luo et al, 2018;Reddi et al, 2020), XRBENCH covers all requirement of an ML-based XR workloads. To be specific, XRBENCH provides diverse cascon-MMMT scenarios with real-time requirement and complex dependencies, which majority ML benchmarks are missing.…”
Section: Detailed Related Work Comparisonmentioning
confidence: 99%