2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) 2016
DOI: 10.1109/reconfig.2016.7857151
|View full text |Cite
|
Sign up to set email alerts
|

An FPGA implementation of a long short-term memory neural network

Abstract: Our work proposes a hardware architecture for a Long Short-Term Memory (LSTM) Neural Network, aiming to outperform software implementations, by exploiting its inherent parallelism. The main design decisions are presented, along with the proposed network architecture. A description of the main building blocks of the network is also presented. The network is synthesized for various sizes and platforms, and the performance results are presented and analyzed. Our synthesized network achieves a 251 times speed-up o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(30 citation statements)
references
References 15 publications
0
30
0
Order By: Relevance
“…However, exponential terms appear in function calculation, which makes it very difficult to directly implement them in FPGA. Lookup table [25] and polynomial approximation [26] are commonly used alternatives at present.…”
Section: Activation Functionmentioning
confidence: 99%
“…However, exponential terms appear in function calculation, which makes it very difficult to directly implement them in FPGA. Lookup table [25] and polynomial approximation [26] are commonly used alternatives at present.…”
Section: Activation Functionmentioning
confidence: 99%
“…However, this strategy largely increases computation latency and power dissipation. Another FPGA-based work reported in Ferreira and Fonseca (2016) fully uses both logic units and memory cells in FPGA to speed up computation and suppress the power dissipation. Work in Chang and Culurciello (2017) balances the data communication that both on-chip LUT and off-chip DRAM are used for internal storage of matrix multiplication to reduce the latency due to off-chip memory access and workload of on-chip communication.…”
Section: Memory Accessmentioning
confidence: 99%
“…There has been much previous work on FPGA based LSTM implementations using on-chip memory to store all the weights. Ferreira et al proposed an FPGA accelerator of LSTM in [7] for a learning problem of adding two 8-bit numbers with weights stored in on-chip memory. Rybalkin et al [8] presented the first hardware architecture designed for BiLSTM for OCR.…”
Section: B Related Workmentioning
confidence: 99%
“…FPGAs have been used to speed up the inference of LSTMs [4,5,6,7], which offer benefits of low latency and low power when compared to CPUs or GPUs. Although FPGA-based LSTM accelerators have advantages in latency and power consumption, they are limited by the memory bandwidth of the FPGA board.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation