2015
DOI: 10.1016/j.compeleceng.2015.03.010
|View full text |Cite
|
Sign up to set email alerts
|

Manycore challenge in particle-in-cell simulation: How to exploit 1 TFlops peak performance for simulation codes with irregular computation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 20 publications
0
7
0
Order By: Relevance
“…While the computational characteristics of PIC codes are fairly well understood, much of the communities optimization and development efforts to date have been focused around platform and hardware specific optimization [31], [32]. As heterogeneity within leading supercomputers increases [33], it is no longer viable to periodically optimize for a single target platform, making portability more important than ever.…”
Section: Related Workmentioning
confidence: 99%
“…While the computational characteristics of PIC codes are fairly well understood, much of the communities optimization and development efforts to date have been focused around platform and hardware specific optimization [31], [32]. As heterogeneity within leading supercomputers increases [33], it is no longer viable to periodically optimize for a single target platform, making portability more important than ever.…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, high performance computing (HPC) optimizations intend to improve the algorithm efficiency and scalability in order to reach higher resolutions and therefore better accuracy. This includes sorting techniques [5][6][7], vectorization [8,9], load balancing [7,10], advanced programming models [11], etc. These improvements are done at the cost of an increased complexity which sometimes prevent their combination.…”
Section: Introductionmentioning
confidence: 99%
“…Although several successful porting of applications to Xeon Phi coprocessors have been reported recently [19][20][21], it is not certain if a significant portion of peak performance is achievable for a wide class of applications, and does efficient porting of an existing application require code tuning in scope of OpenMP or rather massive rewriting similar to porting to GPUs. A brief analysis shows that although a straightforward porting can be done very quickly even for a large application, it will be efficient only if the application was properly optimized for CPUs and has a large degree of parallelism on thread-level and SIMD-level.…”
Section: Introductionmentioning
confidence: 99%