2012 IEEE 26th International Parallel and Distributed Processing Symposium 2012
DOI: 10.1109/ipdps.2012.106
|View full text |Cite
|
Sign up to set email alerts
|

Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers

Abstract: Abstract-Scientists across a wide range of domains increasingly rely on computer simulation for their investigations. Such simulations often spend a majority of their run-times solving large systems of linear equations that require vast amounts of computational power and memory. It is hence critical to design solvers in a highly efficient and scalable manner. Hypre is a high performance, scalable software library that offers several optimized linear solver routines and pre-conditioners. In this paper, we study… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(3 citation statements)
references
References 21 publications
(28 reference statements)
0
3
0
Order By: Relevance
“…The link rate is upgraded to 25 Gbps from 14 Gbps of the TianHe-2A supercomputer system. Collective offload [5] accelerates collective operations, effectively improves the throughput of a single chip. HFI-E provides the software-hardware interface for accessing the high-performance network, implementing the proprietary Mini Packet/Remote Direct Memory Access (MP/RDMA) communication and collective offload mechanism.…”
Section: Proprietary Interconnectmentioning
confidence: 99%
“…The link rate is upgraded to 25 Gbps from 14 Gbps of the TianHe-2A supercomputer system. Collective offload [5] accelerates collective operations, effectively improves the throughput of a single chip. HFI-E provides the software-hardware interface for accessing the high-performance network, implementing the proprietary Mini Packet/Remote Direct Memory Access (MP/RDMA) communication and collective offload mechanism.…”
Section: Proprietary Interconnectmentioning
confidence: 99%
“…In addition, all PAMI collective calls are non-blocking. We plan to explore MPI 3.0 non-blocking (Hoefler et al, 2007;Kandalla et al, 2012Kandalla et al, , 2013 collective implementation that takes advantage of the non-blocking APIs in PAMI.…”
Section: Summary and Future Workmentioning
confidence: 99%
“…The mechanisms are generic enough to implement both blocking and nonblocking semantics, rootbased (Reduce) and non-root based reductions (Allreduce), unlike this research [8]. Compared to [5], the concepts and mechanisms are portable across different architectures.…”
Section: Related Workmentioning
confidence: 99%