2022
DOI: 10.1016/j.jpdc.2022.03.012
|View full text |Cite
|
Sign up to set email alerts
|

Model-based selection of optimal MPI broadcast algorithms for multi-core clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 37 publications
0
5
0
Order By: Relevance
“…However, by having a local pointer to the beginning of this shared memory area, any process on the same node can independently access the broadcast data. Since the size of the broadcast message remains consistent with that of the pure MPI broadcast as shown in figure 3, performing the across-node broadcast operation across all the roots becomes straightforward in this scenario [13,24]. Here we just start the passive target synchronization epoch for all processes and assign the data to all processes locations as shown in figure 4, corresponding to the window win.…”
Section: Rma Broadcastmentioning
confidence: 99%
See 3 more Smart Citations
“…However, by having a local pointer to the beginning of this shared memory area, any process on the same node can independently access the broadcast data. Since the size of the broadcast message remains consistent with that of the pure MPI broadcast as shown in figure 3, performing the across-node broadcast operation across all the roots becomes straightforward in this scenario [13,24]. Here we just start the passive target synchronization epoch for all processes and assign the data to all processes locations as shown in figure 4, corresponding to the window win.…”
Section: Rma Broadcastmentioning
confidence: 99%
“…At the first round in figure 6 process 0 put the buffer b in the memory of processes 1 and 2, at the secondround process 1 put the buffer in the memory of processes 3 and 4, for the third round process 2 puts the message in the buffer of 5 and 6. Finally, at the fourth round, process 3 puts the message in the memory of process 7 [13]. The height of the binary tree is equal to TTotal = T log2(p) at each round the maximum number is 2 i for i is the round number.…”
Section: Binary Tree Algorithmmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition, the complexity of peer-to-peer communication becomes O(n); however, when utilizing the self-generation concept, the data to be transmitted in the forward step is 0, and this is replaced by broadcast set communication in the synchronization step. In terms of the broadcast process, the complexity can be reduced to O(log n) using tree algorithms [26,27]. Consequently, due to the optimized design of the prediction time and the reduced communication time, the proposed PPRN method can accelerate the pipeline with much less overhead than CPPipe.…”
Section: Reduced Network Communicationsmentioning
confidence: 99%