On a Model of Virtual Address Translation

Jurkiewicz, Tomasz; Mehlhorn, Kurt

doi:10.1145/2656337

Cited by 4 publications

(4 citation statements)

References 7 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Extrapolating this to 100 GiB gives 84 494 seconds -about 100 times more than the algorithm from [11]. Note that divsufsort is much slower when extrapolating from traditionally small inputs as we need a 64-bit version, due to NUMA effects, and because for really large inputs a logarithmic term in virtual address translation becomes noticeable [16]. There exists also a parallel variant of divsufsort, where one of its two sorting steps is parallelized.…”

Section: Methodsmentioning

confidence: 99%

Engineering a Distributed Full-Text Index

Fischer

Kurpicz

Sanders

2017

2017 Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments (ALENEX)

View full text Add to dashboard Cite

We present a distributed full-text index for big data applications in a distributed environment. Our index can answer different types of pattern matching queries (existential, counting and enumeration). We perform experiments on inputs up to 100 GiB using up to 512 processors, and compare our index with the distributed suffix array by Arroyuelo et al. [Parall. Comput. 40(9): 471-495, 2014]. The result is that our index answers counting queries up to 5.5 times faster than the distributed suffix array, while using about the same space. We also provide a succinct variant of our index that uses only one third of the memory compared with our non-succinct variant, at the expense of only 20% slower query times.

show abstract

Section: Methodsmentioning

confidence: 99%

Engineering a Distributed Full-Text Index

Fischer

Kurpicz

Sanders

2017

2017 Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments (ALENEX)

View full text Add to dashboard Cite

show abstract

“…Additionally, we use an adaptive number of buckets on the last two levels of the recursion, such that the expected size of the final buckets remains reasonable. For example, instead of performing two 256-way partitioning steps to get 2 16 buckets of 2 elements, we might perform two 64-way partitioning steps to get 2 12 buckets of about 32 elements. Furthermore, on the last level, we perform the base case sorting immediately after the bucket has been completely filled in the cleanup phase, before processing the other buckets.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…Consequently, a huge amount of research on sorting has been done. In particular, algorithm engineering has studied how to make sorting practically fast in presence of complex features of modern hardware like multi-core (e.g., [4,28,29,30]), instruction parallelism (e.g., [27]), branch prediction (e.g., [9,17,18,27]), caches (e.g., [4,6,10,27]), or virtual memory (e.g., [16,24]). In contrast, the sorting algorithms used in the standard libraries of programming languages like Java or C++ still use variants of quicksort -an algorithm that is more than 50 years old.…”

Section: Introductionmentioning

confidence: 99%

In-place Parallel Super Scalar Samplesort (IPS$^4$o)

Axtmann,

Witt,

Ferizovic

et al. 2017

Preprint

View full text Add to dashboard Cite

We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability. The main algorithmic contributions are new ways to make distribution-based algorithms in-place: On the practical side, by using coarse-grained block-based permutations, and on the theoretical side, we show how to eliminate the recursion stack. Extensive experiments show that our algorithm IPS 4 o scales well on a variety of multi-core machines. We outperform our closest in-place competitor by a factor of up to 3. Even as a sequential algorithm, we are up to 1.5 times faster than the closest sequential competitor, BlockQuicksort.

show abstract

“…Consequently, a huge amount of research on sorting has been done. In particular, algorithm engineering has studied how to make sorting practically fast in presence of complex features of modern hardware like multi-core (e.g., [9,11,28,35,37,46,51,57,61,61,63,65,66,70,70]) instruction parallelism (e.g., [17,37,61,64]), branch prediction (e.g., [9,21,43,64,67,76]), caches (e.g., [11,14,26,46,64]), or virtual memory (e.g., [42,62,71]). In contrast, the sorting algorithms used in the standard libraries of programming languages like Java or C++ still use variants of quicksort -an algorithm that is more than 50 years old [36].…”

Section: Introductionmentioning

confidence: 99%

Engineering In-place (Shared-memory) Sorting Algorithms

Axtmann¹,

Witt²,

Ferizovic³

et al. 2020

Preprint

View full text Add to dashboard Cite

We present new sequential and parallel sorting algorithms that now represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. Somewhat surprisingly, part of the speed advantage is due to the additional feature of the algorithms to work in-place, i.e., they do not need a significant amount of space beyond the input array. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account.Our new comparison-based algorithm In-place Superscalar Samplesort (IPS 4 o), combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best previous inplace parallel comparison-based sorting algorithms by almost a factor of three. That algorithm also outperforms the best comparison-based competitors regardless of whether we consider in-place or not in-place, parallel or sequential settings.Another surprising result is that IPS 4 o even outperforms the best (in-place or not in-place) integer sorting algorithms in a wide range of situations. In many of the remaining cases (often involving near-uniform input distributions, small keys, or a sequential setting), our new In-place Parallel Super Scalar Radix Sort (IPS 2 Ra) turns out to be the best algorithm.Claims to have the -in some sense -"best" sorting algorithm can be found in many papers which cannot all be true. Therefore, we base our conclusions on an extensive experimental study involving a large part of the cross product of 21 state-of-the-art sorting codes, 6 data types, 10 input distributions, 4 machines, 4 memory allocation strategies, and input sizes varying over 7 orders of magnitude. This confirms the claims made about the robust performance of our algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications. This is particularly true for integer sorting algorithms giving one reason to prefer comparison-based algorithms for robust general-purpose sorting.

show abstract

On a Model of Virtual Address Translation

Cited by 4 publications

References 7 publications

Engineering a Distributed Full-Text Index

Engineering a Distributed Full-Text Index

In-place Parallel Super Scalar Samplesort (IPS$^4$o)

Engineering In-place (Shared-memory) Sorting Algorithms

Contact Info

Product

Resources

About