Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2017
DOI: 10.1145/3126908.3126931
|View full text |Cite
|
Sign up to set email alerts
|

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
8
2

Relationship

4
6

Authors

Journals

citations
Cited by 46 publications
(22 citation statements)
references
References 43 publications
0
22
0
Order By: Relevance
“…Furthermore, the work of [76] performs several experimentations on KNL with different applications, through which Roofline performance models are drawn for different configurations of KNL. The performance of the hybrid memory system of KNL is investigated in [77], which provides an analytic model for performance tuning. A Roofline model specifically for benchmarking the performance of a welloptimized OpenMP implementation of the tall-skinny matrix multiplication kernel for a molecular dynamics application code is proposed in [67], which essentially leverages the thread-level parallelism on KNL.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…Furthermore, the work of [76] performs several experimentations on KNL with different applications, through which Roofline performance models are drawn for different configurations of KNL. The performance of the hybrid memory system of KNL is investigated in [77], which provides an analytic model for performance tuning. A Roofline model specifically for benchmarking the performance of a welloptimized OpenMP implementation of the tall-skinny matrix multiplication kernel for a molecular dynamics application code is proposed in [67], which essentially leverages the thread-level parallelism on KNL.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…To summarize, we argue that GPUs are the promising platform for the ALS workload when taking both performance and power consumption into account. In the future, we will further investigate the performance gap between platforms and push the factorizing performance to the hardware limit (in particular on newer Intel Xeon Phi processors with onpackage high bandwidth memory [35,36], newer GPUs on warp-level [37,38], CTA-level [39] and cache-level [40], and other emergent accelerators such as Matrix-2000 [41]).…”
Section: Applying Optimizationsmentioning
confidence: 99%
“…Data processing for high memory bandwidth X-Stream accelerates graph processing with sequential access [55]. Recent work optimized quick sort [11], hash joins [14], scientific workloads [40,50], and machine learning [70] for KNL's HBM, but not streaming analytics. Beyond KNL, Mondrian [18] uses hardware support for analytics on high memory bandwidth in near-memory processing.…”
Section: Related Workmentioning
confidence: 99%