Faaiz Asim scite author profile

Faaiz Asim

4Publications

6Citation Statements Received

33Citation Statements Given

How they've been cited

How they cite others

Affiliations

Ulsan National Institute of Science and Technology, Kyung Hee University

Publications

Order By: Most citations

A Deep Learning Accelerator Based on a Streaming Architecture for Binary Neural Networks

Asim

et al. 2022

IEEE Access

View full text Add to dashboard Cite

Deep neural networks (DNNs) have been playing an increasingly important role in a wide range of areas such as computer vision and voice recognition. While training and validation become gradually feasible with high-end general purpose processors such as graphical processor units (GPU), high throughput inferences in embedded hardware platforms with low hardware resources and power consumption efficiency are still challenging. Binarized neural networks (BNNs) are emerging as a promising method to overcome these challenges by reducing bit widths of DNN data representations with many optimal prior solutions. However, accuracy degradation compared to the same architecture with full precision is a considerable problem of the BNN, while the binary neural networks still contain significant redundancy for optimization. In this paper, to address the limitations, we implement a streaming accelerator architecture with three optimization techniques: pipelining-unrolling for streaming each layer, weight reuse for parallel computation, and MAC (multiplication-accumulation) compression. Our method first constructs streaming architecture by pipelining-unrolling method to maximize throughput. Next, the weight reuse method with the K-mean cluster is applied to reduce the complexity of the popcount operation. Finally, MAC compression reduces hardware resources used for remaining computation on MAC operations. The implemented hardware accelerator integrated into a state-of-the-art field programable gate array (FPGA) provides the maximum performance of the classification at 1531k frames per second with 98.4% accuracy for the MNIST dataset and 205K frame per second with 80.2% accuracy for the Cifar-10 dataset. Besides, the proposed design's ratio FPS/LUTs is approximately 57 (MNIST) and 0.707 (Cifar-10), which is much lower than the state-of-the-art design with a comparable throughput and inference accuracy.

show abstract

Quarry: Quantization-based ADC Reduction for ReRAM-based Deep Neural Network Accelerators

Azamat

Asim

Lee

2021

View full text Add to dashboard Cite

Hardware Platform-Aware Binarized Neural Network Model Optimization

Asim

Alimkhanuly

et al. 2022

Applied Sciences

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) have shown superior accuracy at the expense of high memory and computation requirements. Optimizing DNN models regarding energy and hardware resource requirements is extremely important for applications with resource-constrained embedded environments. Although using binary neural networks (BNNs), one of the recent promising approaches, significantly reduces the design’s complexity, accuracy degradation is inevitable when reducing the precision of parameters and output activations. To balance between implementation cost and accuracy, in addition to proposing specialized hardware accelerators for corresponding specific network models, most recent software binary neural networks have been optimized based on generalized metrics, such as FLOPs or MAC operation requirements. However, with the wide range of hardware available today, independently evaluating software network structures is not good enough to determine the final network model for typical devices. In this paper, an architecture search algorithm based on estimating the hardware performance at the design time is proposed to achieve the best binary neural network models for hardware implementation on target platforms. With the XNOR-net used as a base architecture and target platforms, including Field Programmable Gate Array (FPGA), Graphic Processing Unit (GPU), and Resistive Random Access Memory (RRAM), the proposed algorithm shows its efficiency by giving more accurate estimation for the hardware performance at the design time than FLOPs or MAC operations.

show abstract

Partial Sum Quantization for Reducing ADC Size in ReRAM-Based Neural Network Accelerators

Azamat

Asim

Kim

et al. 2023

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Faaiz Asim

A Deep Learning Accelerator Based on a Streaming Architecture for Binary Neural Networks

Quarry: Quantization-based ADC Reduction for ReRAM-based Deep Neural Network Accelerators

Hardware Platform-Aware Binarized Neural Network Model Optimization

Partial Sum Quantization for Reducing ADC Size in ReRAM-Based Neural Network Accelerators

Contact Info

Product

Resources

About