2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 2020
DOI: 10.1109/sbac-pad49847.2020.00023
|View full text |Cite
|
Sign up to set email alerts
|

High Performance and Portable Convolution Operators for Multicore Processors

Abstract: The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the im2col transform followed by a general matrix multiplication (gemm) in order to take advantage of the highly optimized realizations of the gemm kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
19
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 19 publications
(19 citation statements)
references
References 21 publications
(27 reference statements)
0
19
0
Order By: Relevance
“…We will focus on four algorithms in particular: im2col, blocking, Winograd convolutions, and FFT convolutions. im2col [14], Winograd [13], and FFT techniques [17] for performing convolutions are all well documented in the literature. We will focus on designing improved blocking algorithms.…”
Section: Attainabilitymentioning
confidence: 99%
“…We will focus on four algorithms in particular: im2col, blocking, Winograd convolutions, and FFT convolutions. im2col [14], Winograd [13], and FFT techniques [17] for performing convolutions are all well documented in the literature. We will focus on designing improved blocking algorithms.…”
Section: Attainabilitymentioning
confidence: 99%
“…In this paper, we extend our previous work in [26] to obtain an efficient integration of the convolution operators in a framework for distributed training of DNNs on clusters of computers equipped with multicore processors. In particular, this work makes the following contributions:…”
Section: Introductionmentioning
confidence: 97%
“…Unfortunately, there are two major problems with this approach: 1) a large memory workspace is required to host the intermediate matrix generated by the im2col transform; and, especially for training, 2) the time to apply this transform is not negligible for complex CNNs. In [26], we presented a portable high performance convolution algorithm based on the BLIS [33] realization of gemm, named convgemm, that practically eliminates the memory and time cost of the im2col transform, while maintaining the portability and performance of the underlying realization of the BLIS gemm for multicore processors.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations