We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A 1 + A 2 + . . . + A s , where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in a block compressed sparse row (BCSR) format yields limited performance gains because it imposes a particular alignment of the matrix non-zero structure, leading to extra work from explicitly padded zeros. Combining splitting and UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. Using application test matrices, we show empirically that speedups can be as high as 2.1× over not blocking at all, and as high as 1.8× over the standard BCSR implementation used in prior work. When performance does not improve, split UBCSR can still significantly reduce matrix storage.Through extensive experiments, we further show that the empirically optimal number of splittings s and the block size for each matrix term A i will in practice depend on the matrix and hardware platform. Our data lay a foundation for future development of fully automated methods for tuning these parameters.
We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A 1 + A 2 + . . . + A s , where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in a block compressed sparse row (BCSR) format yields limited performance gains because it imposes a particular alignment of the matrix non-zero structure, leading to extra work from explicitly padded zeros. Combining splitting and UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. Using application test matrices, we show empirically that speedups can be as high as 2.1× over not blocking at all, and as high as 1.8× over the standard BCSR implementation used in prior work. When performance does not improve, split UBCSR can still significantly reduce matrix storage.Through extensive experiments, we further show that the empirically optimal number of splittings s and the block size for each matrix term A i will in practice depend on the matrix and hardware platform. Our data lay a foundation for future development of fully automated methods for tuning these parameters.
In this communication, we address a numerical method to efficiently calculate a non-zero optimal threshold value for the performance improvement of a natural frequency-based radar target recognition. From the probability of correct classification with respect to a varying threshold value, we define a function, denoted by in this communication, so that one of the roots of corresponds to the optimal threshold. The final optimal threshold value is obtained via the Newton iteration. The scheme is validated by comparing the threshold obtained from the Newton iteration with that using the probability density function.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.