Efficient Design Space Exploration for Sparse Mixed Precision Neural Architectures

Chitty-Venkata, Krishna Teja; Emani, Murali; Vishwanath, Venkatram; Somani, Arun K.

doi:10.1145/3502181.3531463

Cited by 3 publications

(4 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The search space pruning can also be extended to other benchmark models. Our previous work [28] has shown a similar trend where different precisions could exhibit similar latency. 2) Skip Connection: We did not consider the skip connection in our Architecture and Mixed Precision search space.…”

Section: Future Workmentioning

confidence: 56%

“…The automatically searched mixed quantized networks offer better latency accuracy tradeoff than Uniform Quantization on CIFAR [26] and ImageNet [27] datasets. The search method is partly taken from our previous work [28], which we applied to a different hardware platform (Nvidia A100 GPU) and Neural Network (ResNet50). However, we significantly contributed to developing a new search space pruning and weight/activation sharing method in this paper.…”

Section: Limitations Of the State-of-the-art (Sota)mentioning

confidence: 99%

See 1 more Smart Citation

Differentiable Neural Architecture, Mixed Precision and Accelerator Co-Search

Chitty-Venkata,

Bian,

Emani

et al. 2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

Quantization, effective Neural Network architecture, and efficient accelerator hardware are three important design paradigms to maximize accuracy and efficiency. Mixed Precision Quantization is a process of assigning different precision to different Neural Network layers for optimized inference. Neural Architecture Search (NAS) is a process of automatically designing the neural network for a task and can also be extended to search for the precision of each weight and activation matrix. In this paper, we develop the following three methods: (i) Fast Differentiable Hardware-aware Mixed Precision Quantization Search method to find optimal precision, (ii) Joint Differentiable hardware-aware Architecture and Mixed Precision Quantization Co-search, (iii) Joint Accelerator, Architecture, and Precision triple co-search to find best possibilities in all the three worlds. We demonstrate the effectiveness of our proposed methods targeting Bitfusion accelerator by searching mixed precision models on MobilenetV2. We achieve better accuracy-latency trade-off models than the manually designed and previously proposed search methods.

show abstract

Section: Future Workmentioning

confidence: 56%

Section: Limitations Of the State-of-the-art (Sota)mentioning

confidence: 99%

Differentiable Neural Architecture, Mixed Precision and Accelerator Co-Search

Chitty-Venkata,

Bian,

Emani

et al. 2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…We develop Mixed Sparse and Precision Search (MSPS) technique[Chitty-Venkata et al (2022a)] to search for efficient weight matrix (dense or sparse) and precision combination for every layer on a fixed pretrained model (Section 6.4). The automatically generated MSPS networks outperform Uniform 2:4 Sparse Int8 and 4 configured networks in terms of accuracy and latency on CIFAR[Krizhevsky et al (2009)] and ImageNet[Deng et al (2009)] datasets.3.…”

mentioning

confidence: 99%

“…The automatically generated MSPS networks outperform Uniform 2:4 Sparse Int8 and 4 configured networks in terms of accuracy and latency on CIFAR[Krizhevsky et al (2009)] and ImageNet[Deng et al (2009)] datasets.3. We extend MSPS and develop a technique to search for Neural Architecture, Sparsity pattern, and Precision (ASPS)[Chitty-Venkata et al (2022a)] to jointly optimize the macro-architecture (kernel size, number of filters) and sparse-precision combination of each layer (Section 6.5). The resulting ASPS outperforms both the baseline Uniform Sparse Int8…”

mentioning

confidence: 99%