Scalable distributed-memory external sorting

Rahn, Mirko; Sanders, Peter; Singler, Johannes

doi:10.1109/icde.2010.5447865

Cited by 20 publications

(9 citation statements)

References 23 publications

(59 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In more recent work, Rahn, Sanders, and Singler [20] describe CANONICALMERGESORT, an <stxxl>-based distributed-memory implementation of the parallel multiway merging approach described by Varman et al [21]. CANON-ICALMERGESORT achieves perfect load-balancing after partitioning the data, but it does not stripe the final output across the nodes of the cluster.…”

Section: Related Workmentioning

confidence: 97%

Out-of-core distribution sort in the FG programming environment

Natarajan

Cormen

Strange

2010

2010 IEEE International Symposium on Parallel &Amp; Distributed Processing, Workshops and PHD Forum (IPDPSW)

View full text Add to dashboard Cite

We describe the implementation of an out-of-core, distribution-based sorting program on a cluster using FG, a multithreaded programming framework. FG mitigates latency from disk-I/O and interprocessor communication by overlapping such high-latency operations with other operations. It does so by constructing and executing a coarse-grained software pipeline on each node of the cluster, where each stage of the pipeline runs in its own thread. The sorting program distributes data among the nodes to create sorted runs, and then it merges sorted runs on each node. When distributing data, the rates at which a node sends and receives data will differ. When merging sorted runs, each node will consume data from each of its sorted runs at varying rates. Under these conditions, a single pipeline running on each node is unwieldy to program and not necessarily efficient. We describe how we have extended FG to support multiple pipelines on each node in two forms. When a node might send and receive data at different rates during interprocessor communication, we use disjoint pipelines on each node: one pipeline to send and one pipeline to receive. When a node consumes and produces data from different streams on the node, we use multiple pipelines that intersect at a particular stage. Experimental results show that by using multiple pipelines, an out-of-core, distribution-based sorting program outperforms an out-of-core sorting program based on columnsort-taking approximately 75%-85% of the timedespite the advantages that the columnsort-based program holds.

show abstract

Section: Related Workmentioning

confidence: 97%

Out-of-core distribution sort in the FG programming environment

Natarajan

Cormen

Strange

2010

2010 IEEE International Symposium on Parallel &Amp; Distributed Processing, Workshops and PHD Forum (IPDPSW)

View full text Add to dashboard Cite

show abstract

“…The sorting example is described in more detail in [2]. Here we highlight some aspects of this work for each of the main activities of algorithm engineering:…”

Section: Parallel External Sortingmentioning

confidence: 99%

Algorithm engineering for scalable parallel external sorting

Sanders

2010

2010 IEEE International Symposium on Parallel &Amp; Distributed Processing (IPDPS)

Self Cite

View full text Add to dashboard Cite

The talk describes algorithm engineering (AE) as a methodology for algorithmic research where design, analysis, implementation and experimental evaluation of algorithms form a feedback cycle driving the development of efficient algorithm. Additional important components of the methodology include realistic models, algorithm libraries, and collections of realistic benchmark instances. We use one main example throughout this paper: sorting huge data sets using many multi-core processors and disks. The described system is the current record holder for the GraySort and MinuteSort sorting benchmarks.Algorithms and data structures are at the heart of every computer application and thus of critical importance for permanently growing areas of engineering, economy, science, and daily life. The subject of algorithmics is the systematic development of efficient algorithms and therefore has pivotal influence on the effective development of reliable and resource-conserving technology. We only mention search engines, bioinformatics, computer graphics, image processing, geographic information systems, cryptography, or planning in production, logistics and transportation as example areas where algorithms play a key role.How is algorithmic innovation transferred to applications? appl. engineering realistic models design implementation libraries algorithm− perf.− guarantees applications deduction falsifiable induction analysis experiments algorithm engineering real Inputs hypotheses Figure 1. Algorithm engineering as a cycle of design, analysis, implementation, and experimental evaluation driven by falsifiable hypotheses.Traditionally, algorithmics used the methodology of algorithm theory which stems from mathematics: algorithms are designed using simple models of problem and machine. Main results are provable performance guarantees for all possible inputs. This approach often leads to elegant, timeless solutions that can be adapted to many applications. The hard performance guarantees lead to reliably high efficiency even for types of inputs that were unknown at implementation time. From the point of view of algorithm theory, taking up and implementing an algorithmic idea is part of application development. Unfortunately, it can be universally observed that this mode of transferring results is a slow process. With growing requirements for innovative algorithms, this causes widening gaps between theory and practice: Realistic hardware with its parallelism, memory hierarchies etc. diverges from traditional machine models.

show abstract

“…The models also indicate that properly implemented versions of mergesort and quicksort are reasonably cache efficient but that samplesort and multiway mergesort are more efficient, and in fact optimal. Correspondingly all the fastest disk sorts indeed use some variant of samplesort or multiway mergesort, as the theory predicts [19].…”

Section: Introductionmentioning

confidence: 95%

Cache and I/O efficent functional algorithms

Blelloch

Harper

2013

Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

View full text Add to dashboard Cite

The widely studied I/O and ideal-cache models were developed to account for the large difference in costs to access memory at different levels of the memory hierarchy. Both models are based on a two level memory hierarchy with a fixed size primary memory (cache) of size M , an unbounded secondary memory organized in blocks of size B. The cost measure is based purely on the number of block transfers between the primary and secondary memory. All other operations are free. Many algorithms have been analyzed in these models and indeed these models predict the relative performance of algorithms much more accurately than the standard RAM model. The models, however, require specifying algorithms at a very low level requiring the user to carefully lay out their data in arrays in memory and manage their own memory allocation.In this paper we present a cost model for analyzing the memory efficiency of algorithms expressed in a simple functional language. We show how some algorithms written in standard forms using just lists and trees (no arrays) and requiring no explicit memory layout or memory management are efficient in the model. We then describe an implementation of the language and show provable bounds for mapping the cost in our model to the cost in the idealcache model. These bound imply that purely functional programs based on lists and trees with no special attention to any details of memory layout can be as asymptotically as efficient as the carefully designed imperative I/O efficient algorithms. For example we describe an O( n B log M/B n B ) cost sorting algorithm, which is optimal in the ideal cache and I/O models.

show abstract

Scalable distributed-memory external sorting

Cited by 20 publications

References 23 publications

Out-of-core distribution sort in the FG programming environment

Out-of-core distribution sort in the FG programming environment

Algorithm engineering for scalable parallel external sorting

Cache and I/O efficent functional algorithms

Contact Info

Product

Resources

About