1995
DOI: 10.1109/71.342126
|View full text |Cite
|
Sign up to set email alerts
|

CCL: a portable and tunable collective communication library for scalable parallel computers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
40
0

Year Published

1996
1996
2006
2006

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 68 publications
(41 citation statements)
references
References 22 publications
1
40
0
Order By: Relevance
“…The better scalability may be due to various reasons, including larger memory and more efficient all-to-all communication subroutines available on the SP2. Interested readers may refer to [14] for more information on all-to-all communications. The emphasis here is that when an algorithm is not ideally scalable, its scalability does vary with machine parameters.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The better scalability may be due to various reasons, including larger memory and more efficient all-to-all communication subroutines available on the SP2. Interested readers may refer to [14] for more information on all-to-all communications. The emphasis here is that when an algorithm is not ideally scalable, its scalability does vary with machine parameters.…”
Section: Resultsmentioning
confidence: 99%
“…The listed communication cost of the PPT algorithm is based on a square 2-D torus with p processors (i.e., 2-D mesh, wraparound, square) [13]. If a hypercube topology or a multistage Omega network is assumed the communication cost would be log(p) r+12(p − 1) b and log(p) r+8(p − 1) n 1 · b for single systems and systems with multiple right sides, respectively [12,14].…”
Section: Fig 2 An Alternative Range Comparison Algorithmmentioning
confidence: 99%
“…Note when f = 6, a parent node receives 5 messages so that the reception costs accumulate to exactly balance the message latency, 5 · r = L. As computation cost increases, the best degree decreases. It is interesting to consider the range [1,2]. Values of f smaller than 2 do not produce meaningful f -nomial trees.…”
Section: Modeling F -Nomial Treesmentioning
confidence: 99%
“…Reduction collectives entail both communication (data transfer) and processing (data reduction operations), and therefore efficient implementations must consider the characteristics of the network, the processor, and the interactions between them. Over the years, many researchers have dedicated significant effort to derive optimal and scalable algorithms [1,2,3,4,5,8]. However, with respect to the underlying system characteristics, all of this work commonly assumed reduction processing must be performed by the host CPU.…”
Section: Introductionmentioning
confidence: 99%
“…Early work on collective communication implements the reduction operation as an inverse broadcast and do not try to optimize the protocols based on different buffer sizes [1]. Other work already handle allreduce as a combination of basic routines, e.g., [2] already proposed the combine-to-all (allreduce) as a combination of distributed combine (reduce scatter) and collect (allgather).…”
Section: Introduction and Related Workmentioning
confidence: 99%