An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution

Liu, Bing; Zou, Danyin; Feng, Lihui; Feng, Shou; Fu, Ping; Li, Junbao

doi:10.3390/electronics8030281

Cited by 77 publications

(65 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is especially common for FPGAs [28], [29], and used for example in hand tracking [30] or language processing [31]. Special attention is also on adapting key principles in neural network archi- tectures, such as depth-wise convolutions for FPGAs [32] or quantized-based operations, such as binary neural networks [33]. In contrast to general, "all purpose" GPUs, tensorprocessing units (TPUs) are specialised on matrix operations, such as multiplications and additions, as massively used in neural networks.…”

Section: Related Workmentioning

confidence: 99%

Efficient Biomedical Image Segmentation on EdgeTPUs at Point of Care

Kist

Döllinger

2020

IEEE Access

View full text Add to dashboard Cite

The U-Net architecture is a state-of-the-art neural network for semantic image segmentation that is widely used in biomedical research. It is based on an encoder-decoder framework and its vanilla version shows already high performance in terms of segmentation quality. Due to its large parameter space, however, it has high computational costs on both, CPUs and GPUs. In a research setting, inference time is relevant, but not crucial for the results. However, especially in mobile, clinical applications a light and fast variant would allow deep-learning assisted, objective diagnosis at the point of care. In this work, we suggest an optimized, tiny-weight U-Net for an inexpensive hardware accelerator. We first mined the U-Net architecture to reduce computational complexity to increase runtime performance by simultaneously keeping the accuracy on a high level. Using an open, biomedical dataset for high-speed videoendoscopy (BAGLS), we show that we can dramatically reduce the parameter space and computations by over 99.8% while keeping the segmentation performance at 95% of our baseline. Using a custom upscaling routine, we further successfully deployed our optimized U-Net to an EdgeTPU hardware accelerator to gain costeffective speed improvements on conventional computers and to showcase the applicability of EdgeTPUs for biomedical imaging processing of large images on portable devices. Combining the optimized architecture and the EdgeTPU, we gain a speedup of >79-times compared to our initial baseline while keeping high accuracy. This combination allows to provide immediate results to the clinician, especially in constrained computational environments, and an objective diagnosis at the point of care.

show abstract

Section: Related Workmentioning

confidence: 99%

Efficient Biomedical Image Segmentation on EdgeTPUs at Point of Care

Kist

Döllinger

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Also, it has to be noted that this hybrid system model was deployed in a tire manufacturing unit, and it produced efficient results in automatically diagnosing the bubble-defects in treads and sidewalls of tires. In the future work, more advanced CNN enabled approaches can be implemented for automated detection of defects [26][27][28][29][30], thus ensuring and realizing a sustainable tire manufacturing process.…”

Section: Discussionmentioning

confidence: 99%

Quality Assessment of Tire Shearography Images via Ensemble Hybrid Faster Region-Based ConvNets

et al. 2019

View full text Add to dashboard Cite

In recent times, the application of enabling technologies such as digital shearography combined with deep learning approaches in the smart quality assessment of tires, which leads to intelligent tire manufacturing practices with automated defects detection. Digital shearography is a prominent approach that can be employed for identifying the defects in tires, usually not visible to human eyes. In this research, the bubble defects in tire shearography images are detected using a unique ensemble hybrid amalgamation of the convolutional neural networks/ConvNets with high-performance Faster Region-based convolutional neural networks. It can be noticed that the routine of region-proposal generation along with object detection is accomplished using the ConvNets. Primarily, the sliding window based ConvNets are utilized in the proposed model for dividing the input shearography images into regions, in order to identify the bubble defects. Subsequently, this is followed by implementing the Faster Region-based ConvNets for identifying the bubble defects in the tire shearography images and further, it also helps to minimize the false-positive ratio (sometimes referred to as the false alarm ratio). Moreover, it is evident from the experimental results that the proposed hybrid model offers a cent percent detection of bubble defects in the tire shearography images. Also, it can be witnessed that the false-positive ratio gets minimized to 18 percent.Electronics 2020, 9, 45 2 of 13 be observed that enabling technologies for smart tire quality assessment for realizing intelligent tire manufacturing has been gaining prominence to address challenges such as automated tire defects detection.Generally, it can be witnessed that in the modern-day scenario, there is a humongous and swift growth in the gadgets, equipment, and devices connected to the internet. These devices, gadgets, and equipment have profound computing characteristics, and at the same time, they are exceedingly performance-oriented [2]. Due to all these facts, the concept of deep learning has evolved into a newer dimension, and it plays a significant part in processing and recognizing images, speech, and video, and so on. Furthermore, the implementation of a deep learning paradigm for automation in conventional manufacturing units significantly minimizes the usage of physical labor, and it also enhances the overall competence and efficacy of these units. Digital shearography is a laser-based measuring approach that relies on the processing of digital data, interferometry, and phase-shifting paradigm [3-5].A shearography system was developed by applying a spatial light modulator for controlling the amount of shearing and the direction of the phase light automatically and accurately. The system eliminates the nonlinear random error and enhances the efficiency of testing [6]. A system was developed using shearography to examine the exterior heatproof covering of a cylinder, and defect detection was done using an artificial intelligence-based recognition algorithm for deep...

show abstract

“…It also has low hardware utilization which results in low throughput per PE. [8] proposes an FPGA-based CNN accelerator with integrated depth-wise separable mode of operation. This accelerator, however, has low throughput because of the usage of 32-bit floating point format.…”

Section: Related Workmentioning

confidence: 99%