2022
DOI: 10.1145/3508390
|View full text |Cite
|
Sign up to set email alerts
|

TAB: Unified and Optimized Ternary, Binary, and Mixed-precision Neural Network Inference on the Edge

Abstract: Ternary Neural Networks (TNNs) and mixed-precision Ternary Binary Networks (TBNs) have demonstrated higher accuracy compared to Binary Neural Networks (BNNs) while providing fast, low-power and memory-efficient inference. Related works have improved the accuracy of TNNs and TBNs, but overlooked their optimizations on CPU and GPU platforms. First, there is no unified encoding for the binary and ternary values in TNNs and TBNs. Second, existing works store the 2-bit quantized data sequentially in 32/64-bit integ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…These instructions can noticeably speed up eightbit QNN inference [16]. Fast implementations are also available for ternary [17][18][19] and binary networks [18,20]. However, binary and ternary networks still suffer from accuracy loss compared to full-precision or eight-bit quantized networks with a similar number of parameters and architecture, which limits their suitability for certain tasks.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…These instructions can noticeably speed up eightbit QNN inference [16]. Fast implementations are also available for ternary [17][18][19] and binary networks [18,20]. However, binary and ternary networks still suffer from accuracy loss compared to full-precision or eight-bit quantized networks with a similar number of parameters and architecture, which limits their suitability for certain tasks.…”
Section: Related Workmentioning
confidence: 99%
“…There are 21 pairs (N x , N w ) which satisfy ( 6): (255, 3), (127, 5), (85, 7), (63, 9), (51, 11), (43,13), (37,15), (31,17), (29,19), (25,21), (23,23) and symmetrical ones. If we compute the average bitwidth required to store x and w as (log 2 N x + log 2 N w )/2, we obtain values in the range 4.51-4.79.…”
Section: High-performance Matrix Multiplicationmentioning
confidence: 99%
“…Quantization with lower precision can further reduce the memory consumption and computation. And ultra-low precision (1 or 2-bit) operations can often be computed efficiently with bit-wise arithmetic and thus achieving signification computation acceleration [28]. However, due to the large quantization noise, the benefits of low precision quantization often come at the cost of significant accuracy degradation.…”
Section: Introductionmentioning
confidence: 99%