Block Iterators for Sparse Matrices

Langr, Daniel; Šimeček, Ivan; Dytrych, Tomáš

doi:10.15439/2016f35

Cited by 4 publications

(4 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, uniformly-blocking formats are parametrized by the block size [13]. To find an optimal block size, matrix nonzero elements need to be sorted repeatedly with respect to different tested block sizes [12,29].…”

Section: Motivationmentioning

confidence: 99%

AQsort: Scalable Multi-Array In-Place Sorting with OpenMP

Langr¹,

Tvrdík²,

Šimeček³

2016

SCPE

Self Cite

View full text Add to dashboard Cite

A new multi-threaded variant of the quicksort algorithm called AQsort and its C++/OpenMP implementation are presented. AQsort operates in place and was primarily designed for high-performance computing (HPC) runtime environments. It can work with multiple arrays at once; such a functionality is frequently required in HPC and cannot be accomplished with standard C pointer-based or C++ iterator-based approach. An extensive study is provided that evaluates AQsort experimentally and compares its performance with modern multi-threaded implementations of in-place and out-of-place sorting algorithms based on OpenMP, Cilk Plus, and Intel TBB. The measurements were conducted on several leading-edge HPC architectures, namely Cray XE6 nodes with AMD Bulldozer CPUs, Cray XC40 nodes with Intel Hasswell CPUs, IBM BlueGene/Q nodes, and Intel Xeon Phi coprocessors. The obtained results show that AQsort provides good scalability and sorting performance generally comparable to its competitors. In particular cases, the performance of AQsort may be slightly lower, which is the price for its universality and ability to work with substantially larger amounts of data.

show abstract

Section: Motivationmentioning

confidence: 99%

AQsort: Scalable Multi-Array In-Place Sorting with OpenMP

Langr¹,

Tvrdík²,

Šimeček³

2016

SCPE

Self Cite

View full text Add to dashboard Cite

show abstract

“…This operation involves (costly) integer division, however, in case of block sizes 2 k ×2 , it may be substituted by much faster logical shift operations. We observed 4 and 7 times faster block preprocessing due to such substitution on an Intel Haswell-based computer system and an Intel Xeon Phi coprocessor, respectively [20].…”

Section: Methodsmentioning

confidence: 84%

“…To evaluate memory footprints of a given matrix for different schemes and some particular tested block size, we need information about numbers of nonzero elements of all nonzero blocks [22]. In the end, this information must be obtained for each distinct block size from the optimization space, which represents the most demanding part of the whole optimization process [20]. The block preprocessing runtime is thus approximately proportional to the number of distinct tested block sizes.…”

Section: Methodsmentioning

confidence: 99%

On Memory Footprints of Partitioned Sparse Matrices

Langr

Šimeček

2017

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

Abstract-The presented study analyses 563 representative benchmark sparse matrices with respect to their partitioning into uniformly-sized blocks. The aim is to minimize memory footprints of matrices. Different block sizes and different ways of storing blocks in memory are considered and statistically evaluated. Memory footprints of partitioned matrices are additionally compared with lower bounds and the CSR storage format. The average measured memory savings against CSR in case of single and double precision are 42.3 and 28.7 percents, respectively. The corresponding worst-case savings are 25.5 and 17.1 percents. Moreover, memory footprints of partitioned matrices were in average 5 times closer to their lower bounds than CSR. Based on the obtained results, we provide generic suggestions for efficient partitioning and storage of sparse matrices in a computer memory.

show abstract

“…To evaluate memory footprints of a given matrix for different schemes and some particular tested block size, we need information about numbers of nonzero elements of all nonzero blocks [24]. In the end, this information must be obtained for each distinct block size from the optimization space, which represents the most demanding part of the whole optimization process [22]. The block preprocessing runtime is thus approximately proportional to the number of distinct tested block sizes.…”

Section: Block Sizesmentioning

confidence: 99%

Analysis of Memory Footprints of Sparse Matrices Partitioned Into Uniformly-Sized Blocks

Langr

Šimeček

2018

SCPE

Self Cite

View full text Add to dashboard Cite

The presented study analyses memory footprints of 563 representative benchmark sparse matrices with respect to their partitioning into uniformly-sized blocks. Different block sizes and different ways of storing blocks in memory are considered and statistically evaluated. Memory footprints of partitioned matrices are then compared with their lower bounds and CSR, index-compressed CSR, and EBF storage formats. The results show that blocking-based storage formats may significantly reduce memory footprints of sparse matrices arising from a wide range of application domains. Additionally, measured consistency of results is presented and discussed, benefits of individual formats for storing blocks are evaluated, and an analysis of best-case and worst-case matrices is provided for in-depth understanding of causes of memory savings of blocking-based formats.

show abstract

Block Iterators for Sparse Matrices

Cited by 4 publications

References 16 publications

AQsort: Scalable Multi-Array In-Place Sorting with OpenMP

AQsort: Scalable Multi-Array In-Place Sorting with OpenMP

On Memory Footprints of Partitioned Sparse Matrices

Analysis of Memory Footprints of Sparse Matrices Partitioned Into Uniformly-Sized Blocks

Contact Info

Product

Resources

About