2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) 2020
DOI: 10.1109/isca45697.2020.00086
|View full text |Cite
|
Sign up to set email alerts
|

DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 71 publications
(30 citation statements)
references
References 24 publications
0
24
0
Order By: Relevance
“…(2) Avoid Top-k Selection: Another problem is how to avoid top-k selection. Instead of sorting all the scores, we can use mean-filtering [58] to search for the important scores. Specifically, in each round, we estimate each row's mean value and only select the query-key pairs whose scores are greater than the mean value.…”
Section: A Top-k Pruning: a Baselinementioning
confidence: 99%
“…(2) Avoid Top-k Selection: Another problem is how to avoid top-k selection. Instead of sorting all the scores, we can use mean-filtering [58] to search for the important scores. Specifically, in each round, we estimate each row's mean value and only select the query-key pairs whose scores are greater than the mean value.…”
Section: A Top-k Pruning: a Baselinementioning
confidence: 99%
“…Thanks to recent advances in DNN compression algorithms [6,11,18,27], parameters of DNN can be converted from 32-bit floating point to extremely low bit-width (e.g., < 4-bits) with negligible inference accuracy degradation, but significantly simplify the computation complexity and mitigate the on-/off-chip data access bottleneck (aka. "memory wall") [32,36].…”
Section: Introductionmentioning
confidence: 99%
“…Some researches exploit the property of DNN to reduce latency by using the parallel characteristics of special acceleration circuit design, such as [ 8 , 9 , 10 , 11 , 12 , 13 , 14 ]. Yet these works ignore that the whole power consumption exceeds budget.…”
Section: Introductionmentioning
confidence: 99%
“…To alleviate the poor-performance problems, a number of studies have been undertaken to accelerate DNN implementations by designing hardware-accelerated intelligent computing architecture for sensing system. Some researches exploit the property of DNN to reduce latency by using the parallel characteristics of special acceleration circuit design, such as [8][9][10][11][12][13][14]. Yet these works ignore that the whole power consumption exceeds budget.…”
Section: Introductionmentioning
confidence: 99%