Hongmin Huang scite author profile

Hongmin Huang

5Publications

9Citation Statements Received

82Citation Statements Given

How they've been cited

How they cite others

126

Affiliations

Guangdong University of Technology

Publications

Order By: Most citations

Design Space Exploration for YOLO Neural Network Accelerator

et al. 2020

View full text Add to dashboard Cite

The You Only Look Once (YOLO) neural network has great advantages and extensive applications in computer vision. The convolutional layers are the most important part of the neural network and take up most of the computation time. Improving the efficiency of the convolution operations can greatly increase the speed of the neural network. Field programmable gate arrays (FPGAs) have been widely used in accelerators for convolutional neural networks (CNNs) thanks to their configurability and parallel computing. This paper proposes a design space exploration for the YOLO neural network based on FPGA. A data block transmission strategy is proposed and a multiply and accumulate (MAC) design, which consists of two 14 × 14 processing element (PE) matrices, is designed. The PE matrices are configurable for different CNNs according to the given required functions. In order to take full advantage of the limited logical resources and the memory bandwidth on the given FPGA device and to simultaneously achieve the best performance, an improved roofline model is used to evaluate the hardware design to balance the computing throughput and the memory bandwidth requirement. The accelerator achieves 41.99 giga operations per second (GOPS) and consumes 7.50 W running at the frequency of 100 MHz on the Xilinx ZC706 board.

show abstract

Low-Power Reconfigurable Architecture of Elliptic Curve Cryptography for IoT

Huang

Zheng

et al. 2021

IEICE Trans. Electron.

View full text Add to dashboard Cite

High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

Huang

et al. 2023

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

Deep convolutional neural networks (DNNs) have been widely used in many applications, particularly in machine vision. It is challenging to accelerate DNNs on embedded systems because real-world machine vision applications should reserve a lot of external memory bandwidth for other tasks, such as video capture and display while leaving little bandwidth for accelerating DNNs. In order to solve this issue, in this study, we propose a high-throughput accelerator, called reconfigurable tiny neural-network accelerator (ReTiNNA) for the bandwidth-limited system, and present a real-time object detection system for the high-resolution video image. We first present a dedicated computation engine that takes different data mapping methods for various filter types to improve data reuse and reduce hardware resources. We then propose an adaptive layer-wise tiling strategy that tiles the feature maps into strips to reduce the control complexity of data transmission dramatically and to improve the efficiency of data transmission. Finally, a design space exploration (DSE) approach is presented to explore design space more accurately in the case of insufficient bandwidth to improve the performance of the low-bandwidth accelerator. With a low bandwidth of 2.23 GB/s and a low hardware consumption of 90.261K LUTs and 448 DSPs, ReTiNNA can still achieve a high performance of 155.86 GOPS on VGG16 and 68.20 GOPS on ResNet50, which is better than other state-of-the-art designs implemented on FPGA devices. Furthermore, the real-time object detection system can achieve a high object detection speed of 19 fps for high-resolution video.

show abstract

An Efficient Parallel Architecture for Convolutional Neural Networks Accelerator on FPGAs

Huang

Yadong

et al. 2022

View full text Add to dashboard Cite

An efficient loop tiling framework for convolutional neural network inference accelerators

Huang

et al. 2021

IET Circuits, Devices & Syst

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have been widely applied in the field of computer vision due to their inherent advantages in image feature extraction. However, it is difficult to implement CNNs directly on embedded platforms owing to excessive calculations of CNNs. Field Programmable Gate Arrays have been popular in CNN accelerators because of their configurability and high energy efficiency. Given the highly parallel workloads of the CNN, a CNN accelerator with a 14 � 16 processing element array is designed in this study to accelerate the CNN inference. Besides, a loop tiling strategy for convolutional layers is proposed to efficiently transmit feature maps. Additionally, the roofline model is employed to explore the best tiling parameters for optimal performance. Finally, the accelerator written in Verilog-HDL language is implemented on the Xilinx Zynq-7045 evaluation platform. At an operating frequency of 200 MHz, the proposed accelerator can achieve a performance of 57.24 giga operations per second on You Only Look Once v2-tiny and 78.39 GOPS on Visual Geometry Group-16. The accelerator only consumes 224 DSPs, demonstrating a better performance compared with the previous works.This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hongmin Huang

Design Space Exploration for YOLO Neural Network Accelerator

Low-Power Reconfigurable Architecture of Elliptic Curve Cryptography for IoT

High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

An Efficient Parallel Architecture for Convolutional Neural Networks Accelerator on FPGAs

An efficient loop tiling framework for convolutional neural network inference accelerators

Contact Info

Product

Resources

About