2008
DOI: 10.1109/icpp-w.2008.21
|View full text |Cite
|
Sign up to set email alerts
|

Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2009
2009
2014
2014

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 16 publications
(11 citation statements)
references
References 8 publications
0
11
0
Order By: Relevance
“…The goal of processor binding is to reduce the conflicts of chip resources on the CMP system. In our previous work [14], we found that processor binding resulted in up to 7.16% performance improvements for MPI scientific applications. Here, we use the command pbind to implement a batch process to bind the threads to different physical processors in order to reduce the resource contentions and system overhead from the dynamic scheduler on Pangu.…”
Section: Figures 4 and 5 Show The Function-level Performance Of Our Omentioning
confidence: 98%
“…The goal of processor binding is to reduce the conflicts of chip resources on the CMP system. In our previous work [14], we found that processor binding resulted in up to 7.16% performance improvements for MPI scientific applications. Here, we use the command pbind to implement a batch process to bind the threads to different physical processors in order to reduce the resource contentions and system overhead from the dynamic scheduler on Pangu.…”
Section: Figures 4 and 5 Show The Function-level Performance Of Our Omentioning
confidence: 98%
“…It is expected that the best number of threads per node is dependent upon the application characteristics and the system architectures. In this paper, we investigate how a hybrid application is sensitive to different memory access patterns, and quantify the performance gap resulting from using different number of threads per node for application execution on a large scale multithreaded BlueGene/Q supercomputer [1] at Argonne National Laboratory using five different hybrid MPI/OpenMP scientific applications (two NAS Parallel benchmarks Multi-Zone SP-MZ and BT-MZ [4], an earthquake simulation PEQdyna [20], an aerospace application PMLB [19] and a 3D particle-in-cell application GTC [2]).…”
Section: Introductionmentioning
confidence: 99%
“…In contrast to the conventional methods in fluid dynamics, which are based on the discretization of macroscopic differential equations, the LBM has the ability to deal efficiently with complex geometrics and topologies [25]. For our experiments, we use the parallel multiblock implementation (extended to 3D problems) of the LBM developed by Yu et al [26].…”
Section: ) Parallel Multiblock Lattice Boltzmann (Pmlb)mentioning
confidence: 99%