Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-128
|View full text |Cite
|
Sign up to set email alerts
|

On the Efficient Representation and Execution of Deep Acoustic Models

Abstract: In this paper we present a simple and computationally efficient quantization scheme that enables us to reduce the resolution of the parameters of a neural network from 32-bit floating point values to 8-bit integer values. The proposed quantization scheme leads to significant memory savings and enables the use of optimized hardware instructions for integer arithmetic, thus significantly reducing the cost of inference. Finally, we propose a 'quantization aware' training process that applies the proposed scheme d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 35 publications
(26 citation statements)
references
References 14 publications
0
26
0
Order By: Relevance
“…which defining the quantization range of both weight matrices W (1) and W (2) . However, our optimization goal is not effected by the choice of the other s j given the resulting r (1) j and r (2) j are smaller than r (1) i and r (2) i , respectively. To break the ties of solutions we decide to set ∀i : r (1) i = r (2) i .…”
Section: A Optimal Range Equalization Of Two Layersmentioning
confidence: 99%
See 1 more Smart Citation
“…which defining the quantization range of both weight matrices W (1) and W (2) . However, our optimization goal is not effected by the choice of the other s j given the resulting r (1) j and r (2) j are smaller than r (1) i and r (2) i , respectively. To break the ties of solutions we decide to set ∀i : r (1) i = r (2) i .…”
Section: A Optimal Range Equalization Of Two Layersmentioning
confidence: 99%
“…However, our optimization goal is not effected by the choice of the other s j given the resulting r (1) j and r (2) j are smaller than r (1) i and r (2) i , respectively. To break the ties of solutions we decide to set ∀i : r (1) i = r (2) i . Thus the channel's ranges between both tensors are matched as closely as possible and the introduced quantization error is spread equally among both weight tensors.…”
Section: A Optimal Range Equalization Of Two Layersmentioning
confidence: 99%
“…The CTC acoustic model (AM) consists of 5 layers of 500 LSTM cells, that predict context-independent phonemes as output targets. The system is heavily compressed, both by quantization [43], and by the application of low-rank projection layers with 200 units between consecutive LSTM layers [44]. The AM consists of 4.6 million parameters in total.…”
Section: Model Detailsmentioning
confidence: 99%
“…However, the ASR accuracy is significantly lost when the model is compressed heavily into even lower bits or the network structure becomes more complex. Therefore, refining with quantization is important for both very low-bit quantization [177], [178] and vector quantization [179] so that the training and testing are consistent.…”
Section: Acoustic Models With Efficient Decodingmentioning
confidence: 99%