2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) 2010
DOI: 10.1109/icde.2010.5447865
|View full text |Cite
|
Sign up to set email alerts
|

Scalable distributed-memory external sorting

Abstract: We engineer algorithms for sorting huge data sets on massively parallel machines. The algorithms are based on the multiway merging paradigm. We first outline an algorithm whose I/O requirement is close to a lower bound. Thus, in contrast to naive implementations of multiway merging and all other approaches known to us, the algorithm works with just two passes over the data even for the largest conceivable inputs. A second algorithm reduces communication overhead and uses more conventional specifications of the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2010
2010
2024
2024

Publication Types

Select...
4
3
3

Relationship

2
8

Authors

Journals

citations
Cited by 20 publications
(9 citation statements)
references
References 23 publications
(59 reference statements)
0
9
0
Order By: Relevance
“…In more recent work, Rahn, Sanders, and Singler [20] describe CANONICALMERGESORT, an <stxxl>-based distributed-memory implementation of the parallel multiway merging approach described by Varman et al [21]. CANON-ICALMERGESORT achieves perfect load-balancing after partitioning the data, but it does not stripe the final output across the nodes of the cluster.…”
Section: Related Workmentioning
confidence: 97%
“…In more recent work, Rahn, Sanders, and Singler [20] describe CANONICALMERGESORT, an <stxxl>-based distributed-memory implementation of the parallel multiway merging approach described by Varman et al [21]. CANON-ICALMERGESORT achieves perfect load-balancing after partitioning the data, but it does not stripe the final output across the nodes of the cluster.…”
Section: Related Workmentioning
confidence: 97%
“…The sorting example is described in more detail in [2]. Here we highlight some aspects of this work for each of the main activities of algorithm engineering:…”
Section: Parallel External Sortingmentioning
confidence: 99%
“…The models also indicate that properly implemented versions of mergesort and quicksort are reasonably cache efficient but that samplesort and multiway mergesort are more efficient, and in fact optimal. Correspondingly all the fastest disk sorts indeed use some variant of samplesort or multiway mergesort, as the theory predicts [19].…”
Section: Introductionmentioning
confidence: 95%