2012
DOI: 10.1007/s00450-012-0211-7
|View full text |Cite
|
Sign up to set email alerts
|

The design of ultra scalable MPI collective communication on the K computer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(13 citation statements)
references
References 6 publications
0
13
0
Order By: Relevance
“…While fast and efficient implementations of all-to-all reductions exist for today's HPC systems [1], they are quite fragile in the sense that a single failure leads to a wrong result on many nodes. Furthermore, it is commonly expected that the future may bring a slight shift from traditional parallel HPC systems towards less tightly coupled systems which need to be more distributed in nature due to the extreme scale and complexity required to go to Exascale and beyond.…”
Section: Introductionmentioning
confidence: 99%
“…While fast and efficient implementations of all-to-all reductions exist for today's HPC systems [1], they are quite fragile in the sense that a single failure leads to a wrong result on many nodes. Furthermore, it is commonly expected that the future may bring a slight shift from traditional parallel HPC systems towards less tightly coupled systems which need to be more distributed in nature due to the extreme scale and complexity required to go to Exascale and beyond.…”
Section: Introductionmentioning
confidence: 99%
“…Finally, N s segment groups are arranged so that reduction communications over v , μ and s are performed through a cross section of the 3D network. is the high-performance and highly-scalable MPI_AlltoAll on K [13], which fully utilizes the bisection bandwidth of 3D torus network but is available only when the communicating processes are arranged in a 3D box shape. Thus, the local 3D box-shaped mapping of rank_xy activates the optimized routine and minimizes the cost of the data transpose.…”
Section: Segmented Mapping On 3d Torus Networkmentioning
confidence: 99%
“…26,27 However, all these optimization and MPI_Alltoallv, that use parameters whose sizes depend on the total number of parallel processes. In the past, various algorithms and optimized implementations for specific network topologies and hardware platforms have been developed for the set of collective MPI communication operations.…”
Section: Related Workmentioning
confidence: 99%