2015 IEEE International Conference on Computer and Communications (ICCC) 2015
DOI: 10.1109/compcomm.2015.7387546
|View full text |Cite
|
Sign up to set email alerts
|

Kepler GPU vs. Xeon Phi: Performance case study with a high-order CFD application

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…Loop-optimizations unrolling, 9,17,23,24,29,50,84,90 collapsing, 4,6,7,13,20,21,44,54 splitting 22,28 Blocking (tiling) in cache, 14,15,18,[20][21][22]27,39,44,52,54,69 registers 68,69 Compile-time optimizations using pre-computed values, 35,52 specifying array and loop bounds at compile time 6,54 Compute-related optimizations Reusing intermediate variables, 22,35 using conflict-detection instruction of AVX-512, 52,85 performing redundant computation to avoid data-communication or atomic operations 52,82 Array transpose 6, 79…”
Section: Ta B L E 3 Optimization Strategiesmentioning
confidence: 99%
See 2 more Smart Citations
“…Loop-optimizations unrolling, 9,17,23,24,29,50,84,90 collapsing, 4,6,7,13,20,21,44,54 splitting 22,28 Blocking (tiling) in cache, 14,15,18,[20][21][22]27,39,44,52,54,69 registers 68,69 Compile-time optimizations using pre-computed values, 35,52 specifying array and loop bounds at compile time 6,54 Compute-related optimizations Reusing intermediate variables, 22,35 using conflict-detection instruction of AVX-512, 52,85 performing redundant computation to avoid data-communication or atomic operations 52,82 Array transpose 6, 79…”
Section: Ta B L E 3 Optimization Strategiesmentioning
confidence: 99%
“…Overall, Phi does not provide comparable performance to CPU as a stand-alone shared memory processor. Thread-affinity strategy Balanced, 4,14,20,21,23,26,36 scatter, 37 compact, 92 no single winner 13,84,94 Memory mode Cache, 60,62 flat, 55,90,96 hybrid (none), no single winner [10][11][12]52,54,57,97 Interconnect clustering mode All-to-all (none), quadrant, 11,62 sub-NUMA, 10,55,57,96,97 no single winner 52,96…”
Section: Gaining Insights Into Phi Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…Because each accelerator has its advantages and disadvantages for certain classes of problems [22,3,17], selecting the best option for a given application is key when searching for maximum performance. To provide some guidelines for such selection, this article presents a comparative analysis between two different HPC architectures (Intel Xeon Phi KNL vs. NVIDIA Pascal).…”
Section: Introductionmentioning
confidence: 99%
“…As a case study, Ref. [8] compared the performance of high-order weighted essentially non-oscillatory scheme CFD application on both K20c GPU and Xeon Phi 31SP MIC, and the result showed that when vector processing units are fully utilized the MIC can achieve equivalent performance to that of GPUs. Ref.…”
Section: Introductionmentioning
confidence: 99%