AQsort: Scalable Multi-Array In-Place Sorting with OpenMP

Langr, Daniel; Tvrdík, Pavel; Šimeček, Ivan

doi:10.12694/scpe.v17i4.1207

Cited by 4 publications

(18 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Step 2 (Sorting of Morton codes): The parallelization of this step is straightforward by using any parallel in-place sort (e.g., sort method from std::algorithm [20] or AQsort [21] parallel add to every nonzero element its Morton code; 3: parallel sort nonzero elements on its Morton code; 4: len = N/th; 5: start of parallel block 6: tid = get tid of current thread (); 7: if tid = 0 then start ← tid · len; 15: for j ← 1, c max do diff ← XOR(new , old ); 20: old ← new ; 21: k ← round up(Highest1(diff )/2); 22: for j ← 1, k do …”

Section: Parallelization 1) Sw Technologiesmentioning

confidence: 99%

Efficient parallel evaluation of block properties of sparse matrices

Šimeček

Langr

2016

Annals of Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

Abstract-Many storage formats for sparse matrices have been developed. Majority of these formats can be parametrized, so the algorithm for finding optimal parameters is crucial. For overall efficiency, it is important to reduce the execution time of this preprocessing. In this paper, we propose a new algorithm for the determination of the number of nonzero blocks of the given size in a sparse matrix. The proposed algorithm requires relatively a small amount of auxiliary memory. Our approach is based on the Morton reordering and bitwise manipulations. We also present a parallel (multithreaded) version and evaluate its performance and space complexity.

show abstract

Section: Parallelization 1) Sw Technologiesmentioning

confidence: 99%

Efficient parallel evaluation of block properties of sparse matrices

Šimeček

Langr

2016

Annals of Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…AQsort is our previous parallel quicksort implementation build upon OpenMP. 1 Its main feature is that-on the contrary to the other implementations-it is capable of working with a user-provided function for swapping elements. This slightly reduces optimization options, but, effectively, allows sorting multiple datasets (such as arrays) at once.…”

mentioning

confidence: 99%

CPP11sort: A parallel quicksort based on C++ threading

Langr

Schovánková

2021

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

A new efficient implementation of the multithreaded quicksort algorithm called CPP11sort is presented. This implementation is built exclusively upon the threading primitives of the C++ programming language itself. The performance of CPP11sort is evaluated and compared with its mainstream competitors provided by GNU, Intel, and Microsoft. It is shown that out of the considered implementations, CPP11sort mostly yields the shortest sorting times and is the only one that is portable to any conforming C++ implementation without a need of external libraries or nonstandard compiler extensions. The experimental evaluation with various input data distributions resulted in parallel speedup between 16.1 and 44.2 on a 56-core server and between 6.8 and 14.5 on a 10-core workstation with enabled hyperthreading.

show abstract

“…To some degree this is affected by the overhead imposed by the high-level library used in the programming effort. We can still draw however some reliable conclusions and reason about the performance of these implementations using the MBSP model, thus making MBSP useful and usable.Integer sorting on multicores and GPUs can be realized by traditional distribution-specific algorithms such as radix-sort [3,12,25,28], or variants of it that use fewer rounds of the baseline count-sort implementation provided additional information about key values is available [6,39].Other approaches include algorithms that use specialized hardware or software features of a particular multicore architecture [4,6,22,25]. Comparison-based algorithms have also been used with some obvious tweaks: use of deterministic regular sampling sorting [34] that utilizes serial radix-sort for local sorting [8,9,10] or use other methods for local sorting [38,3,5,6,22].…”

mentioning

confidence: 99%

“…Integer sorting on multicores and GPUs can be realized by traditional distribution-specific algorithms such as radix-sort [3,12,25,28], or variants of it that use fewer rounds of the baseline count-sort implementation provided additional information about key values is available [6,39].…”

mentioning

confidence: 99%

“…Other approaches include algorithms that use specialized hardware or software features of a particular multicore architecture [4,6,22,25]. Comparison-based algorithms have also been used with some obvious tweaks: use of deterministic regular sampling sorting [34] that utilizes serial radix-sort for local sorting [8,9,10] or use other methods for local sorting [38,3,5,6,22].…”

mentioning

confidence: 99%

See 1 more Smart Citation

A Study of Integer Sorting on Multicores

Gerbessiotis

2018

Parallel Process. Lett.

View full text Add to dashboard Cite

Integer sorting on multicores and GPUs can be realized by a variety of approaches that include variants of distribution-based methods such as radix-sort, comparison-oriented algorithms such as deterministic regular sampling and random sampling parallel sorting, and network-based algorithms such as Batcher's bitonic sorting algorithm.In this work we present an experimental study of integer sorting on multicore processors.We have implemented serial and parallel radix-sort for various radixes, deterministic regular oversampling and random oversampling parallel sorting, and also some previously little explored or unexplored variants of bitonic-sort and odd-even transposition sort.The study uses multithreading and multiprocessing parallel programming libraries with the C language implementations working under Open MPI, MulticoreBSP, and BSPlib utilizing the same source code.A secondary objective is to attempt to model the performance of these algorithm implementations under the MBSP (Multi-memory BSP) model. We first provide some general high-level observations on the performance of these implementations. If we can conclude anything is that accurate prediction of performance by taking into consideration architecture dependent features such as the structure and characteristics of multiple memory hierarchies is difficult and more often than not untenable. To some degree this is affected by the overhead imposed by the high-level library used in the programming effort. We can still draw however some reliable conclusions and reason about the performance of these implementations using the MBSP model, thus making MBSP useful and usable.Integer sorting on multicores and GPUs can be realized by traditional distribution-specific algorithms such as radix-sort [3,12,25,28], or variants of it that use fewer rounds of the baseline count-sort implementation provided additional information about key values is available [6,39].Other approaches include algorithms that use specialized hardware or software features of a particular multicore architecture [4,6,22,25]. Comparison-based algorithms have also been used with some obvious tweaks: use of deterministic regular sampling sorting [34] that utilizes serial radix-sort for local sorting [8,9,10] or use other methods for local sorting [38,3,5,6,22]. Network-based algorithms such as Batcher's [1] bitonic sorting [23,3,30,31,5] have also been utilized. In particualar, bitonic sorting is a low programming overhead algorithm and thus more suitable for GPU and few-core architectures, is simple to implement, and quite fast when few keys are to be sorted, even if its theoretical performance is suboptimal.In this work we perform an experimental study of integer sorting on multicore processors using multithreading and multiprocessing based libraries that facilitate parallel programming. Our implementations need only recompilation of the same C language source to work under Open MPI [29], MulticoreBSP [36], and a multi-processing and out of maintenace library, BSPlib [19].Towards this we have impl...

show abstract

AQsort: Scalable Multi-Array In-Place Sorting with OpenMP

Cited by 4 publications

References 25 publications

Efficient parallel evaluation of block properties of sparse matrices

Efficient parallel evaluation of block properties of sparse matrices

CPP11sort: A parallel quicksort based on C++ threading

A Study of Integer Sorting on Multicores

Contact Info

Product

Resources

About