The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the im2col transform followed by a general matrix multiplication (gemm) in order to take advantage of the highly optimized realizations of the gemm kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the im2col transform; and 2) the time to perform the im2col transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the gemm kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit im2col transform, while maintaining the portability and performance of the underlying realization of gemm in BLIS.
In this paper, we introduce an improved bound on the 2-norm of Hermite matrix polynomials. As a consequence, this estimate enables us to present and prove a matrix version of the Riemann-Lebesgue lemma for Fourier transforms. Finally, our theoretical results are used to develop a novel procedure for the computation of matrix exponentials with a priori bounds. A numerical example for a test matrix is provided.
Modeling the execution time of the Sparse Matrix-Vector multiplication (SpMV) on a current CPU architecture is especially complex due to i) irregular memory accesses; ii) indirect memory referencing; and iii) low arithmetic intensity. While analytical models may yield accurate estimates for the total number of cache hits/misses, they often fail to predict accurately the total execution time. In this paper, we depart from the analytic approach to instead leverage Convolutional Neural Networks (CNNs) in order to provide an effective estimation of the performance of the SpMV operation. For this purpose, we present a high-level abstraction of the sparsity pattern of the problem matrix and propose a blockwise strategy to feed the CNN models by blocks of non-zero elements. The experimental evaluation on a representative subset of the matrices from the SuiteSparse Matrix collection demonstrates the robustness of the CNN models for predicting the SpMV performance on an Intel Haswell core. Furthermore, we show how to generalize the network models to other target architectures to estimate the performance of SpMV on an ARM A57 core.
Soundprism is a real‐time algorithm to separate polyphonic music audio into source signals, given the musical score of the audio in advance. This paper presents a framework for a Soundprism implementation. A study of the sound quality of the online score‐informed source separation is shown, although a real‐time implementation is not carried out. The system is compound of two stages: (1) a score follower that matches a MIDI score position to each time frame of the musical performance; and (2) a source separator based on a nonnegative matrix factorization approach guided by the score. Real audio mixtures composed of an instrumental quartets were employed to obtain preliminary results of the proposed system.
In this paper, we introduce two approaches to compute the matrix hyperbolic tangent. While one of them is based on its own definition and uses the matrix exponential, the other one is focused on the expansion of its Taylor series. For this second approximation, we analyse two different alternatives to evaluate the corresponding matrix polynomials. This resulted in three stable and accurate codes, which we implemented in MATLAB and numerically and computationally compared by means of a battery of tests composed of distinct state-of-the-art matrices. Our results show that the Taylor series-based methods were more accurate, although somewhat more computationally expensive, compared with the approach based on the exponential matrix. To avoid this drawback, we propose the use of a set of formulas that allows us to evaluate polynomials in a more efficient way compared with that of the traditional Paterson–Stockmeyer method, thus, substantially reducing the number of matrix products (practically equal in number to the approach based on the matrix exponential), without penalising the accuracy of the result.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.