APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Wang, Tianzhe; Wang, Kuan; Cai, Han; Lin, Ji; Liu, Zhijian; Wang, Hanrui; Lin, Yujun; Han, Song

doi:10.1109/cvpr42600.2020.00215

Cited by 140 publications

(116 citation statements)

References 24 publications

Supporting

Mentioning

107

Contrasting

Order By: Relevance

“…In AMC [208] and [209], learning-based approaches are adopted to prune and quantize the models for algorithm-hardware co-design. In APQ [210], pruning and quantization are optimized jointly with the NN model avoiding any accuracy loss.…”

Section: G Methods For Model Compressionmentioning

confidence: 99%

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

et al. 2020

View full text Add to dashboard Cite

Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning (DL) is already present in many applications ranging from computer vision for medicine to autonomous driving of modern cars as well as other sectors in security, healthcare, and finance. However, to achieve impressive performance, these algorithms employ very deep networks, requiring a significant computational power, both during the training and inference time. A single inference of a DL model may require billions of multiply-and-accumulated operations, making the DL extremely compute-and energy-hungry. In a scenario where several sophisticated algorithms need to be executed with limited energy and low latency, the need for cost-effective hardware platforms capable of implementing energy-efficient DL execution arises. This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance designs. This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process. In addition to hardware solutions, this paper discusses some of the important security issues that these DNN and SNN models may have during their execution, and offers a comprehensive section on benchmarking, explaining how to assess the quality of different networks and hardware systems designed for them.

show abstract

Section: G Methods For Model Compressionmentioning

confidence: 99%

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The exponentially large search space consisting of billions of or even more architectures renders NAS a very challenging task [15,40,41,43,45,47]. The key reason is that evaluating and ranking the architectures in terms of metrics of interest (e.g., accuracy and latency) can be extremely time-consuming.…”

Section: Introductionmentioning

confidence: 99%

“…The key reason is that evaluating and ranking the architectures in terms of metrics of interest (e.g., accuracy and latency) can be extremely time-consuming. As a result, many studies have been focused on reducing the cost 1 of training and evaluating the architecture accuracy, including reinforcement learning-based NAS with accuracy evaluated based on a small proxy dataset [52], differentiable NAS [45], one-shot or few-shot NAS [4,9,51], NAS assisted with an accuracy predictor [15,43], among many others.…”

Section: Introductionmentioning

confidence: 99%

One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Jiang

Shi

et al. 2021

Proc. ACM Meas. Anal. Comput. Syst.

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity --- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.

show abstract

“…The evolutionary design of neural networks, or neuroevolution, has recently led to the fully automated design of complex CNNs that are quite competitive in terms of accuracy and size, even for the most challenging datasets such as Im-ageNet [72]. In order to represent a candidate CNN in the genotype, a well-known CNN (such as MobileNetV2 in [73]) is usually taken as a template. The genotype then contains a set of parameters, each of them specifying possible values of the critical network's hyperparameters (the layer type, the number of filters, the kernel sizes, etc.).…”

Section: B Hardware-aware Neural Architecture Searchmentioning

confidence: 99%

“…In [74], the fixed-point quantization is applied as a postprocessing step after NAS is finished. However, for example, in APQ, a suitable quantization scheme is directly evolved during NAS [73]. APQ thus performs a joint search for architecture, pruning, and quantization policy starting with the MobileNetV2 network.…”

Section: B Hardware-aware Neural Architecture Searchmentioning

confidence: 99%

Evolutionary Algorithms in Approximate Computing: A Survey

Sekanina

2021

JICS

View full text Add to dashboard Cite

In recent years, many design automation methods have been developed to routinely create approximate implementations of circuits and programs that show excellent trade-offs between the quality of output and required resources. This paper deals with evolutionary approximation as one of the popular approximation methods. The paper provides the first survey of evolutionary algorithm (EA)-based approaches applied in the context of approximate computing. The survey reveals that EAs are primarily applied as multi-objective optimizers. We propose to divide these approaches into two main classes: (i) parameter optimization in which the EA optimizes a vector of system parameters, and (ii) synthesis and optimization in which EA is responsible for determining the architecture and parameters of the resulting system. The evolutionary approximation has been applied at all levels of design abstraction and in many different applications. The neural architecture search enabling the automated hardware-aware design of approximate deep neural networks was identified as a newly emerging topic in this area.

show abstract

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Cited by 140 publications

References 24 publications

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Evolutionary Algorithms in Approximate Computing: A Survey

Contact Info

Product

Resources

About