2016
DOI: 10.1007/978-3-319-46079-6_24
|View full text |Cite
|
Sign up to set email alerts
|

Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
34
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 57 publications
(35 citation statements)
references
References 13 publications
0
34
0
Order By: Relevance
“…Their model is recently extended to explore KNL [46], which includes constructing several performance models for certain combinations of KNL clustering and memory modes. Furthermore, the work of [76] performs several experimentations on KNL with different applications, through which Roofline performance models are drawn for different configurations of KNL. The performance of the hybrid memory system of KNL is investigated in [77], which provides an analytic model for performance tuning.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…Their model is recently extended to explore KNL [46], which includes constructing several performance models for certain combinations of KNL clustering and memory modes. Furthermore, the work of [76] performs several experimentations on KNL with different applications, through which Roofline performance models are drawn for different configurations of KNL. The performance of the hybrid memory system of KNL is investigated in [77], which provides an analytic model for performance tuning.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…Unlike the CARM that includes the complete memory hierarchy in a single plot, the ORM mainly considers the memory transfers between the last level cache and the DRAM, thus it provides fundamentally different perspective and insights when characterizing and optimizing applications [18]. Recently, the ORM was also instantiated on the KNL [19], without modifying the original model. The arithmetic intensity (AI) described in ORM is not to be confused with CARM AI because of the difference in the way how the memory traffic is observed.…”
Section: Related Workmentioning
confidence: 99%
“…The bandwidth measured also differs from the one measured in this paper, the latter being explicitly load bandwidth. In [19], the authors present several ORM-based optimization case studies, and compare the performance improvements between Haswell processor and KNL, with data in DDR4 memory or MCDRAM, and finally KNL with data in MCDRAM memory. However, the authors do not show how the model can help choosing between memories when working sets do not fit in the fastest one nor they provide a comparison with the cache mode.…”
Section: Related Workmentioning
confidence: 99%
“…PICSAR is an open-source ParticleIn-Cell FORTRAN+Python library designed to provide highperformance subroutines optimized for many-integrated core architectures [40], [41] that can be interfaced with WARP.…”
Section: Case Study 5 -Warp-picsarmentioning
confidence: 99%
“…Cartesian based PIC codes have a low flop/byte ratio that leads non-optimized algorithms to be highly memorybound [41]. Large field and particle arrays cannot in cache in most simulations.…”
Section: Case Study 5 -Warp-picsarmentioning
confidence: 99%