Performance evaluation of multiple precision matrix multiplications using parallelized Strassen and Winograd algorithms

Kouya, Tomonori

doi:10.14495/jsiaml.8.21

Cited by 7 publications

(5 citation statements)

References 8 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, these algorithms provide better performance when used together with large multiple precision floating-point arithmetic. We confirmed their efficiency through benchmark tests and applications of LU decomposition [ 7 , 8 ].…”

Section: Parallelized Strassen and Winograd Algorithmsmentioning

confidence: 60%

“…We have already released multiple precision matrix multiplication library, BNCmatmul [ 10 ] using divide-and-conquer algorithms, including Strassen, and parallelized various precision matrix multiplication based on DD, QD, and MPFR libraries with OpenMP [ 7 , 8 ]. In comparison with Rgemm in MPLAPACK (MBLAS) [ 6 ], it produces better performance for large-sized matrices.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Performance Evaluation of Strassen Matrix Multiplication Supporting Triple-Double Precision Floating-Point Arithmetic

Kouya

2020

Computational Science and Its Applications – ICCSA 2020

Self Cite

View full text Add to dashboard Cite

The Strassen matrix multiplication can be categorized into divide-and-conquer algorithms, and they are known as the most efficient algorithms. We previously implemented them supporting multiple precision floating-point arithmetic using MPFR and Bailey’s QD libraries and have shown their effectiveness in our papers and open-source codes. In preparation for a future release, we have introduced an optimized triple-word floating-point arithmetic proposed by Fabiano et al., and we found its utility in our implementation of multiple precision matrix multiplication. In this paper, we demonstrate the effectiveness of the Strassen triple-double precision matrix multiplication through performance evaluation compared to those based on QD and MPFR libraries.

show abstract

Section: Parallelized Strassen and Winograd Algorithmsmentioning

confidence: 60%

Section: Introductionmentioning

confidence: 99%

Performance Evaluation of Strassen Matrix Multiplication Supporting Triple-Double Precision Floating-Point Arithmetic

Kouya

2020

Computational Science and Its Applications – ICCSA 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…We have previously reported that the serial Strassen matrix multiplication and parallelized versions for the large n are more efficient than Rgemm of MP-BLAS [12]. Our accelerated versions of block and Strassen matrix multiplication with AVX2 can be achieved more than two times faster than the MPBLAS for any size of matrices.…”

Section: Computational Time Of Serial Matrix Multiplication and Compa...mentioning

confidence: 89%

“…Among all types of matrix multiplication, we adopted the row-major method to access the elements of matrices and parallelized them by OpenMP [12]. We fixed matrices A and B as follows:…”

Section: Benchmark Tests Of Dd Td and Qd Matrix Multiplicationmentioning

confidence: 99%

Acceleration of multiple precision matrix multiplication based on multi-component floating-point arithmetic using AVX2

Kouya

2021

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, we report the results obtained from the acceleration of multi-binary64-type multiple precision matrix multiplication with AVX2. We target double-double (DD), triple-double (TD), and quad-double (QD) precision arithmetic designed by certain types of error-free transformation (EFT) arithmetic. Furthermore, we implement SIMDized EFT functions, which simultaneously compute with four bi-nary64 numbers on x86 64 computing environment, and by using help of them, we also develop SIMDized DD, TD, and QD additions and multiplications. In addition, AVX2 load/store functions were adopted to efficiently speed up reading and storing matrix elements from/to memory. Owing to these combined techniques, our implemented multiple precision matrix multiplications have been accelerated more than three times compared with non-accelerated ones. Our accelerated matrix multiplication modifies the performance of parallelization with OpenMP.

show abstract

“…Kouya et al [14] (second paper) compared parallelized Strassen and Winograd algorithms for multiple precision matrix multiplications using MPFR/GMP and QD libraries. They used thread-based parallelization to improve performance.…”

Section: A Literature Reviewmentioning

confidence: 99%

Investigation of Energy and Power Characteristics of Various Matrix Multiplication Algorithms

Alsari,

Al-Hashimi

2024

Energies

View full text Add to dashboard Cite

This work studied the energy behavior of six matrix multiplication algorithms with various physical asset usage patterns. Two were variants of the straight inner product of rows and columns. The rest were variants of Strassen’s divide-and-conquer. Cases varied in ways that were expected to affect energy behavior. The study collected data for square matrix dimensions up to 4000. The research used reliable on-chip integrated voltage regulators embedded in a recent HPC-class AMD CPU for power measurements. Inner product methods used much less energy than the others for small to moderately large matrices. The advantage diminished for sufficiently large dimensions. The power draw of the inner product methods was less for small dimensions. After a point, the power advantage shifted significantly in favor of the divide-and-conquer group (average of 24% better), with the more block-optimized versions showing increased power efficiency (at least 8.3% better than the base method). The study explored the interplay between algorithm design, power efficiency, and computational resources. It aims to help advance the cause of power efficiency in HPC and other scenarios that rely on this vital computation.

show abstract

Performance evaluation of multiple precision matrix multiplications using parallelized Strassen and Winograd algorithms

Cited by 7 publications

References 8 publications

Performance Evaluation of Strassen Matrix Multiplication Supporting Triple-Double Precision Floating-Point Arithmetic

Performance Evaluation of Strassen Matrix Multiplication Supporting Triple-Double Precision Floating-Point Arithmetic

Acceleration of multiple precision matrix multiplication based on multi-component floating-point arithmetic using AVX2

Investigation of Energy and Power Characteristics of Various Matrix Multiplication Algorithms

Contact Info

Product

Resources

About