Hyun-Jin Moon scite author profile

2005

We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A 1 + A 2 + . . . + A s , where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in a block compressed sparse row (BCSR) format yields limited performance gains because it imposes a particular alignment of the matrix non-zero structure, leading to extra work from explicitly padded zeros. Combining splitting and UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. Using application test matrices, we show empirically that speedups can be as high as 2.1× over not blocking at all, and as high as 1.8× over the standard BCSR implementation used in prior work. When performance does not improve, split UBCSR can still significantly reduce matrix storage.Through extensive experiments, we further show that the empirically optimal number of splittings s and the block size for each matrix term A i will in practice depend on the matrix and hardware platform. Our data lay a foundation for future development of fully automated methods for tuning these parameters.

show abstract

Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

Vuduc

¹

,

Moon

²

2005

View full text Add to dashboard Cite

We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A 1 + A 2 + . . . + A s , where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in a block compressed sparse row (BCSR) format yields limited performance gains because it imposes a particular alignment of the matrix non-zero structure, leading to extra work from explicitly padded zeros. Combining splitting and UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. Using application test matrices, we show empirically that speedups can be as high as 2.1× over not blocking at all, and as high as 1.8× over the standard BCSR implementation used in prior work. When performance does not improve, split UBCSR can still significantly reduce matrix storage.Through extensive experiments, we further show that the empirically optimal number of splittings s and the block size for each matrix term A i will in practice depend on the matrix and hardware platform. Our data lay a foundation for future development of fully automated methods for tuning these parameters.

show abstract

Application of the alternating projection strategy to the Capon beamforming and the MUSIC algorithm for azimuth/elevation AOA estimation

Lee

¹

,

Cho

²

,

Moon

³

2013

Journal of Electromagnetic Waves and Applications

View full text Add to dashboard Cite

A Study on the Understanding of Artificial Intelligence (AI) and the Examples and Applications of AI-based Music Tools

Moon¹,

Yunhee²

2022

Korean Assoc Learner-Centered Curric Instr

View full text Add to dashboard Cite

Numerically Efficient Determination of the Optimal Threshold in Natural Frequency-Based Radar Target Recognition

Lee

¹

,

Moon

²

,

Jeong

³

2014

IEEE Trans. Antennas Propagat.

View full text Add to dashboard Cite

In this communication, we address a numerical method to efficiently calculate a non-zero optimal threshold value for the performance improvement of a natural frequency-based radar target recognition. From the probability of correct classification with respect to a varying threshold value, we define a function, denoted by in this communication, so that one of the roots of corresponds to the optimal threshold. The final optimal threshold value is obtained via the Newton iteration. The scheme is validated by comparing the threshold obtained from the Newton iteration with that using the probability density function.

show abstract

Hyun-Jin Moon

Fast sparse matrix-vector multiplication by exploiting variable block structure

Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

Application of the alternating projection strategy to the Capon beamforming and the MUSIC algorithm for azimuth/elevation AOA estimation

A Study on the Understanding of Artificial Intelligence (AI) and the Examples and Applications of AI-based Music Tools

Numerically Efficient Determination of the Optimal Threshold in Natural Frequency-Based Radar Target Recognition

Contact Info

Product

Resources

About