2021
DOI: 10.1109/les.2020.2975055
|View full text |Cite
|
Sign up to set email alerts
|

Bactran: A Hardware Batch Normalization Implementation for CNN Training Engine

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(15 citation statements)
references
References 7 publications
0
15
0
Order By: Relevance
“…On the other hand, General Processing Units (GPUs) supply more resources in terms of memory and computational units. But, during the validation process, GPUs have a long execution time because of sequential logic design (Wang et al 1999 ; Zhijie et al 2020 ). FPGA has parallel architectures on its logic configuration.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, General Processing Units (GPUs) supply more resources in terms of memory and computational units. But, during the validation process, GPUs have a long execution time because of sequential logic design (Wang et al 1999 ; Zhijie et al 2020 ). FPGA has parallel architectures on its logic configuration.…”
Section: Introductionmentioning
confidence: 99%
“…This feature makes it better choice for hardware designers because of the validation process of CNN configurations. At the implementation stage of CNN models, normalization process ensures fewer number of hardware resources and reduced power consumption during the training and validation processes on any hardware platforms (Wang et al 1999 ; Zhijie et al 2020 ; Nurvitadhi et al 2016 ; Baptista et al 2020 ).…”
Section: Introductionmentioning
confidence: 99%
“…FPGA has parallel architectures on its logic con guration and this feature makes it better choice for hardware designers during validation process of CNN con gurations. At the implementation stage of CNN models on any hardware, normalization process ensures fewer number of hardware resources and reduced power consumption during the training and validation processes [22][23][24][25].…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, General Processing Units (GPUs) supply more resources in terms of memory and computational units. But, during the validation process, GPUs have a long execution time because of sequential logic design [22,23]. FPGA has parallel architectures on its logic con guration and this feature makes it better choice for hardware designers during validation process of CNN con gurations.…”
Section: Introductionmentioning
confidence: 99%
“…Last but not least, the backward propagation requires saving the pre-normalized activations, thus occupying roughly twice the memory as a non-BN network in the training phase [16]. For these reasons, BN becomes the most crucial part in non-convolution layers [17] because it involves about 58.5% of the execution time and 90% operations of non-convolution layers [18]. Research also shows that BN lowers the overall training speed by > 47% for deep ResNet and DenseNet models [19] and brings incredible difficulties in the special DNN inference and online training accelerator architecture design [18].…”
Section: Introductionmentioning
confidence: 99%