F-CNN: An FPGA-based framework for training Convolutional Neural Networks

Zhao, Wenlai; Fu, Haohuan; Luk, Wayne; Yu, Teng; Wang, Shaojun; Bao, Feng; Ma, Yuchun; Yang, Guangwen

doi:10.1109/asap.2016.7760779

Cited by 43 publications

(14 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, TABLE 6 summarizes the comparison with the state-of-the-art ANN training accelerators for MNIST classification [40][41][42]. Small networks are selected for ANNs to present a better comparison with our work.…”

Section: F Fractional Precisionmentioning

confidence: 99%

Efficient Neuromorphic Hardware Through Spiking Temporal Online Local Learning

Guo

Fouda

Eltawil

et al. 2022

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Local learning schemes have shown promising performance in spiking neural networks training and are considered a step toward more biologically plausible learning. Despite many efforts to design high-performance neuromorphic systems, a fast and efficient on-chip training algorithm is still missing, which limits the deployment of neuromorphic systems in many real-time applications. This work proposes a scalable, fast, and efficient spiking neuromorphic hardware system with on-chip local learning capability. We introduce an effective hardwarefriendly local training algorithm compatible with sparse temporal input coding and binary random classification weights. The algorithm is demonstrated to deliver competitive accuracy in different tasks. The proposed digital system explores spike sparsity in communication, parallelism in vector-matrix operations and process-level dataflow, and locality of training errors, which leads to low cost and fast training speed. The system is optimized under various performance metrics. Taking into consideration energy, speed, resources, and accuracy, the proposed method shows around 10× efficiency over a recent work with a direct feedback alignment method and 4.5× efficiency over the spike-timing-dependent plasticity method. Moreover, our hardware architecture can easily scale up with the network size at a linear rate. Thus, our method has demonstrated great potential for use in various applications, especially those demanding low latency.

show abstract

Section: F Fractional Precisionmentioning

confidence: 99%

Efficient Neuromorphic Hardware Through Spiking Temporal Online Local Learning

Guo

Fouda

Eltawil

et al. 2022

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

show abstract

“…Other companies such as Microsoft [30,31] and Amazon's "AWS EC2 F1" instance followed suit in using FPGA clusters within their data centres and servers for back-end training and inference at a lower power cost-highlighting the trend for low-power solutions utilising FPGAs. CNN training on FPGA platforms has not been investigated thoroughly with only two exceptions that focus on batch training which uses FPGA platforms as replacements for GPUs clusters in offline training [28,32]. In [27] Wenlai et al presented F-CNN, the first CPU/FPGA hybrid design for deploying and training CNN networks.…”

Section: Related Workmentioning

confidence: 99%

“…This requires the introduction of significant resource overheads since it does not fully consider the overlap in calculations within the forward pass. In [32] Venkataramanaiah et al extends work from [28] and introduces a hardware CNN training RTL compiler. Their work is purely FPGA and relies on static processing element arrays for convolutional calculations.…”

Section: Related Workmentioning

confidence: 99%

“…Despite the overlap, the training process of CNNs is still computationally complicated and requires significant changes to the data path of the hardware design, managing the data from the forward pass, updating parameters, and calculating gradients. Compared to inference, the training phase involves a much higher number of operations (> 3×) with increased mathematical complexity [27], and a backward pass can take twice the time of a forward pass [28].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automated CNN back-propagation pipeline generation for FPGA online training

Mazouz

Bridges

2021

J Real-Time Image Proc

View full text Add to dashboard Cite

Training of convolutional neural networks (CNNs) on embedded platforms to support on-device learning has become essential for the future deployment of CNNs on autonomous systems. In this work, we present an automated CNN training pipeline compilation tool for Xilinx FPGAs. We automatically generate multiple hardware designs from high-level CNN descriptions using a multi-objective optimization algorithm that explores the design space by exploiting CNN parallelism. These designs that trade-off resources for throughput allow users to tailor implementations to their hardware and applications. The training pipeline is generated based on the backpropagation (BP) equations of convolution which highlight an overlap in computation. We translate the overlap into hardware by reusing most of the forward pass (FP) pipeline reducing the resources overhead. The implementation uses a streaming interface that lends itself well to data streams and live feeds instead of static data reads from memory. Meaning, we do not use the standard array of processing elements (PEs) approach, which is efficient for offline inference, instead we translate the architecture into a pipeline where data is streamed through allowing for new samples to be read as they become available. We validate the results using the Zynq-7100 on three datasets and varying size architectures against CPU and GPU implementations. GPUs consistently outperform FPGAs in training times in batch processing scenarios, but in data stream scenarios, FPGA designs achieve a significant speedup compared to GPU and CPU when enough resources are dedicated to the learning task. A 2.8×, 5.8×, and 3× speed up over GPU was achieved on three architectures trained on MNIST, SVHN, and CIFAR-10 respectively.

show abstract

“…For example, a 102-convolutional-layer CNN model, which contains 42.4 M parameters, costs 14 ms to classify a 224 × 224 × 3 scene image while a simple 4-convolutional-layer CNN model costs 8.77 ms and only contains 1 M parameters, as detailed in Section 3.1 of this paper. This is an unacceptable cost of time and storage space in special situations, such as embedded devices [52][53][54] or during on-orbit processing [55]. In contrast, a small and shallow model is fast and uses little space, but will not yield accurate and precise results when trained directly on ground truth data [33].…”

Section: Introductionmentioning

confidence: 99%

Training Small Networks for Scene Classification of Remote Sensing Images via Knowledge Distillation

et al. 2018

View full text Add to dashboard Cite

Scene classification, aiming to identify the land-cover categories of remotely sensed image patches, is now a fundamental task in the remote sensing image analysis field. Deep-learning-model-based algorithms are widely applied in scene classification and achieve remarkable performance, but these high-level methods are computationally expensive and time-consuming. Consequently in this paper, we introduce a knowledge distillation framework, currently a mainstream model compression method, into remote sensing scene classification to improve the performance of smaller and shallower network models. Our knowledge distillation training method makes the high-temperature softmax output of a small and shallow student model match the large and deep teacher model. In our experiments, we evaluate knowledge distillation training method for remote sensing scene classification on four public datasets: AID dataset, UCMerced dataset, NWPU-RESISC dataset, and EuroSAT dataset. Results show that our proposed training method was effective and increased overall accuracy (3% in AID experiments, 5% in UCMerced experiments, 1% in NWPU-RESISC and EuroSAT experiments) for small and shallow models. We further explored the performance of the student model on small and unbalanced datasets. Our findings indicate that knowledge distillation can improve the performance of small network models on datasets with lower spatial resolution images, numerous categories, as well as fewer training samples.

show abstract

F-CNN: An FPGA-based framework for training Convolutional Neural Networks

Cited by 43 publications

References 16 publications

Efficient Neuromorphic Hardware Through Spiking Temporal Online Local Learning

Efficient Neuromorphic Hardware Through Spiking Temporal Online Local Learning

Automated CNN back-propagation pipeline generation for FPGA online training

Training Small Networks for Scene Classification of Remote Sensing Images via Knowledge Distillation

Contact Info

Product

Resources

About