Jeng-Hau Lin scite author profile

Convolutional neural networks (CNN) are the current stateof-the-art for many computer vision tasks. CNNs outperform older methods in accuracy, but require vast amounts of computation and memory. As a result, existing CNN applications are typically run on clusters of CPUs or GPUs. Research on FPGA acceleration of CNN workloads has achieved reductions in power and energy consumption. However, large GPUs outperform modern FPGAs in throughput, and the existence of compatible deep learning frameworks give GPUs a significant advantage in programmability. Recent work in machine learning demonstrates the potential of very low precision CNNs-i.e., CNNs with binarized weights and activations. Such binarized neural networks (BNNs) appear well suited for FPGA implementation, as their dominant computations are bitwise logic operations and their memory requirements are greatly reduced. A combination of low-precision networks and high-level design methodology may help address the performance and productivity gap between FPGAs and GPUs. In this paper, we present the design of a BNN accelerator that is synthesized from C++ to FPGA-targeted Verilog. The accelerator outperforms existing FPGA-based CNN accelerators in GOPS as well as energy and resource efficiency.

show abstract

An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations

Jiao¹,

Luo²,

Lin³

et al. 2017

View full text Add to dashboard Cite

Fast Methodology for Determining Eye Diagram Characteristics of Lossy Transmission Lines

Guo

Lin

et al. 2009

IEEE Trans. Adv. Packag.

View full text Add to dashboard Cite

MATEX: A distributed framework for transient simulation of power distribution networks

Zhuang

Weng

Lin

et al. 2014

View full text Add to dashboard Cite

We proposed MATEX, a distributed framework for transient simulation of power distribution networks (PDNs). MATEX utilizes matrix exponential kernel with Krylov subspace approximations to solve differential equations of linear circuit. First, the whole simulation task is divided into subtasks based on decompositions of current sources, in order to reduce the computational overheads. Then these subtasks are distributed to different computing nodes and processed in parallel. Within each node, after the matrix factorization at the beginning of simulation, the adaptive time stepping solver is performed without extra matrix re-factorizations. MATEX overcomes the stiffness hinder of previous matrix exponential-based circuit simulator by rational Krylov subspace method, which leads to larger step sizes with smaller dimensions of Krylov subspace bases and highly accelerates the whole computation. MATEX outperforms both traditional fixed and adaptive time stepping methods, e.g., achieving around 13X over the trapezoidal framework with fixed time step for the IBM power grid benchmarks.

show abstract

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Lin

Xing

Zhao

et al. 2017

View full text Add to dashboard Cite

State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution. Such networks strain the computational capabilities and energy available to embedded and mobile processing platforms, restricting their use in many important applications. In this paper, we push the boundaries of hardware-effective CNN design by proposing BCNN with Separable Filters (BCNNw/SF), which applies Singular Value Decomposition (SVD) on BCNN kernels to further reduce computational and storage complexity. To enable its implementation, we provide a closed form of the gradient over SVD to calculate the exact gradient with respect to every binarized weight in backward propagation. We verify BCNNw/SF on the MNIST, CIFAR-10, and SVHN datasets, and implement an accelerator for CIFAR-10 on FPGA hardware. Our BCNNw/SF accelerator realizes memory savings of 17% and execution time reduction of 31.3% compared to BCNN with only minor accuracy sacrifices.

show abstract

Multi-tenant mobile offloading systems for real-time computer vision applications

Fang

Lin

Srivastava

et al. 2019

View full text Add to dashboard Cite

show abstract

Dynamic analysis of power delivery network with nonlinear components using matrix exponential method

Zhuang

Kang

Wang

et al. 2015

View full text Add to dashboard Cite

Local Binary Pattern Networks

Lin¹,

Lazarow²,

Yang³

et al. 2020

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jeng-Hau Lin

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations

Fast Methodology for Determining Eye Diagram Characteristics of Lossy Transmission Lines

MATEX: A distributed framework for transient simulation of power distribution networks

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Multi-tenant mobile offloading systems for real-time computer vision applications

Dynamic analysis of power delivery network with nonlinear components using matrix exponential method

Local Binary Pattern Networks

Contact Info

Product

Resources

About