Xiaoyin Ma scite author profile

| In the past decade or so we have witnessed a steadily increasing interest in FPGAs as hardware accelerators:they provide an excellent mid-point between the reprogrammability of software devices (CPUs, DSPs, and GPUs) and the performance and low energy consumption of ASICs. However, the programmability of FPGA-based accelerators remains one of the biggest obstacles to their wider adoption. Developing FPGA programs requires extensive familiarity with hardware design and experience with a tedious and complex tool chain.

show abstract

High-Throughput Fixed-Point Object Detection on FPGAs

Ma¹,

Najjar²,

Roy-Chowdhury³

2014

View full text Add to dashboard Cite

Computer vision applications make extensive use of floating-point number representation, both single and double precision. The major advantage of floating-point representation is the very large range of values that can be represented with a limited number of bits. Most CPU, and all GPU designs have been extensively optimized for short latency and high-throughput processing of floating-point operations. On an FPGA, the bit-width of operands is a major determinant of its resource utilization, the achievable clock frequency and hence its throughput. By using a fixed-point representation with fewer bits, an application developer could implement more processing units and a higher-clock frequency and a dramatically larger throughput. However, smaller bit-widths may lead to inaccurate or incorrect results.Object and human detection are fundamental problems in computer vision and a very active research area. In these applications a high throughput and an economy of resources are highly desirable features allowing the applications to be embedded in mobile or fielddeployable equipment.

show abstract

Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs

Najjar

Roy-Chowdhury

2015

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Abstract-The reliance on object or people detection is rapidly growing beyond surveillance to industrial and social applications. The Histogram of Oriented Gradients (HOG), one of the most popular object detection algorithms, achieves high detection accuracy but delivers just under one frame-per-second (fps) on a high-end CPU. FPGA accelerations of this algorithm are limited by the intensive floating-point computations. All current fixedpoint HOG implementations use large bit-width to maintain detection accuracy, or perform poorly at reduced data precision. In this paper we introduce the full-image evaluation methodology to explore the FPGA implementation of HOG using reduced bit-width. This approach lessens the required area resources on the FPGA and increases the clock frequency and hence the throughput per device through increased parallelism. We evaluate the detection accuracy of the fixed-point HOG by applying state-of-the-art computer vision pedestrian detection evaluation metrics and show it performs as well as the original floatingpoint code from OpenCV. We then show our single FPGA implementation achieves a 68.7x higher throughput than a highend CPU, 5.1x higher than a high-end GPU, and 7.8x higher than the same implementation using floating-point on the same FPGA. A power consumption comparison for different platforms shows our fixed-point FPGA implementation uses 130x less power than CPU, and 31x less energy than GPU to process one image.

show abstract

Heterogeneous Acceleration of HAR Applications

Rodriguez-Borbon

Roy-Chowdhury

et al. 2020

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Human action recognition (HAR) is an important field of research that intercepts with areas such as image processing, computer vision, and the design of fast algorithms, among others. HAR has several important applications including healthcare monitoring, security and surveillance, assisted living, smart homes, and video search and indexing. Despite recent developments in the field, major challenges remain. For instance, HAR is computationally expensive. Tasks such as video preprocessing, feature extraction, feature quantization, and feature classification require the execution of millions of arithmetic operations for a video sequence lasting a few seconds. To address these problems, we propose a heterogeneous approach that is based on an extensive algorithmic and experimental analysis of the histogram of gradients (HOG3D) application. We divide the application into four stages and evaluate each on CPU, GPU, and FPGA platforms. Our heterogeneous design combines the strengths of both FPGA and GPU platforms, and achieves a 1.3X speedup compared with a state-of-the-art GPU while being 1.5X more energy efficient than other homogeneous solutions, including FPGA-based designs. Moreover, our heterogeneous HAR design using fixed-point arithmetic has comparable accuracy to those of HAR algorithms using single precision floating point arithmetic.

show abstract

Optimizing hardware design for Human Action Recognition

Borbon

Najjar

et al. 2016

View full text Add to dashboard Cite

Abstract-Human action recognition (HAR) is an important topic in computer vision having a wide range of applications: health care, assisted living, surveillance, security, gaming, etc. Despite significant amount of work having been conducted in this area in recent years, the execution speed still limits real-time applications. Moreover, it is highly desirable to have the computeintensive feature extraction stage done right at the output of the camera to extract and transfer only action feature in multicamera network setting and hence reduce network bandwidth requirement. In this work, we first evaluate the possibility to perform feature extraction under reduced precision fixed-point arithmetic to ease hardware resource requirements. We compared the Histogram of Oriented Gradient in 3D (HOG3D) feature extraction with state-of-the-art Convolutional Neural Networks (CNNs) methods and shown the later to be 75X slower than the former. Our experiment shows that by re-training the classifier with reduced data precision, the classification performs as well as the original double-precision floating-point. Based on this result, we implement an FPGA-based HAR feature extraction for near camera processing using fixed-point data representation and arithmetic. This implementation, using a single Xilinx Virtex 6 FPGA, achieves about 70x speedup over multicore CPU. Furthermore, a GPU implementation of HAR is introduced with 80x speedup over CPU (on an Nvidia Tesla K20). Last but not least, a power comparison is presented for the three platforms.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaoyin Ma

High-Level Language Tools for Reconfigurable Computing

High-Throughput Fixed-Point Object Detection on FPGAs

Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs

Heterogeneous Acceleration of HAR Applications

Optimizing hardware design for Human Action Recognition

Contact Info

Product

Resources

About