1999
DOI: 10.1007/pl00008265
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Algorithms for Block-Cyclic Redistribution of Arrays

Abstract: The block-cyclic data distribution is commonly used to organize array elements over the processors of a coarse-grained distributed memory parallel computer. In many scientific applications, the data layout must be reorganized at run-time in order to enhance locality and reduce remote memory access overheads. In this paper we present a general framework for developing array redistribution algorithms. Using this framework, we have developed efficient algorithms that redistribute an array from one block-cyclic la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

1999
1999
2007
2007

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 50 publications
(17 citation statements)
references
References 14 publications
0
17
0
Order By: Relevance
“…Examples are the processor mapping techniques [4,10,12] for minimizing data transmission overheads, the multiphase redistribution strategy [11] for reducing message startup cost, the communication scheduling approaches [2,7,13,20] for avoiding node contention and the strip mining approach [18] for overlapping communication and computational overheads.…”
Section: Related Workmentioning
confidence: 99%
“…Examples are the processor mapping techniques [4,10,12] for minimizing data transmission overheads, the multiphase redistribution strategy [11] for reducing message startup cost, the communication scheduling approaches [2,7,13,20] for avoiding node contention and the strip mining approach [18] for overlapping communication and computational overheads.…”
Section: Related Workmentioning
confidence: 99%
“…The generalized BCC uses uses bipartite matching approach for data redistribution. Lim et al [8] developed a redistribution framework that could redistribute one-dimensional array from one block-cyclic scheme to another on the same processor set using a generalized circulant matrix formalism. Their algorithm applies row and column transformations on the communication schedule matrix to generate a conflict-free schedule.…”
Section: Relatedworkmentioning
confidence: 99%
“…This approach incurs minimum data transmission cost but communication start-up cost can be high. To minimize the start-up cost, indirect schedules [16]- [18] can be used. In this approach, data blocks are sent to their destination through intermediate "relay" nodes.…”
Section: A Three-mentioning
confidence: 99%