2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) 2020
DOI: 10.1109/fccm48280.2020.00011
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Reconfigurable Recurrent Neural Networks

Abstract: This paper proposes a novel latency-hiding hardware architecture based on column-wise matrix-vector multiplication to eliminate data dependency, improving the throughput of systems of RNN models. In addition, a flexible checkerboard tiling strategy is introduced to allow large weight matrices, while supporting element-based parallelism and vector-based parallelism. These optimizations improve the exploitation of the available parallelism to increase run-time hardware utilization and boost inference throughput.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 23 publications
(18 citation statements)
references
References 30 publications
0
14
1
Order By: Relevance
“…Their paper has proposed a hardware architecture for a LSTM by exploiting the inherent parallelism mechanism and aimed to outperform the software implementations. Z. Que et al [23] have proposed a novel latency-hiding hardware architecture based on a column-wise matrix-vector multiplication mechanism to eliminate data dependency and to improve the throughput of RNN models. The proposed architecture has been implemented on Ama10 and Stratix 10 FPGAs.…”
Section: Overview Of Lstm Network and Reversible Logic A Long Short-term Memory (Lstm) Networkmentioning
confidence: 99%
“…Their paper has proposed a hardware architecture for a LSTM by exploiting the inherent parallelism mechanism and aimed to outperform the software implementations. Z. Que et al [23] have proposed a novel latency-hiding hardware architecture based on a column-wise matrix-vector multiplication mechanism to eliminate data dependency and to improve the throughput of RNN models. The proposed architecture has been implemented on Ama10 and Stratix 10 FPGAs.…”
Section: Overview Of Lstm Network and Reversible Logic A Long Short-term Memory (Lstm) Networkmentioning
confidence: 99%
“…In [26] a novel Timestep(TS)-buffer is introduced to avoid redundant calculations of LSTM gate operations to reduce system latency. In [27], the authors propose a novel latency-hiding hardware architecture based on column-wise matrix-vector multiplication to eliminate data dependency, improving the throughput of systems of LSTM/GRU models. These LSTM implementations store all the weights in on-chip memory of FPGAs.…”
Section: Previous Workmentioning
confidence: 99%
“…We perform the bit-sparse quantization of the LSTM model through retraining, a fine-tuning process commonly used for the fixed-point quantization [13]. We quantize all of the weights for updating the LSTM gates to the bit-sparse data type and keep the rest of weights as the fixed point.…”
Section: Bit-sparse Data Representationmentioning
confidence: 99%
“…Various approaches have been proposed for the energy efficient LSTM/RNN inference accelerators [3,4,6,7,11,13,[15][16][17]. [6] designed a low power LSTM accelerator for the keyword spotting under 5 µW with 60 nJ/inference energy efficiency.…”
Section: Related Work 61 Efficient Lstm Inference Acceleratormentioning
confidence: 99%
See 1 more Smart Citation