2019
DOI: 10.1109/access.2019.2917312
|View full text |Cite
|
Sign up to set email alerts
|

Acceleration of LSTM With Structured Pruning Method on FPGA

Abstract: This paper focuses on accelerating long short-term memory (LSTM), which is one of the popular types of recurrent neural networks (RNNs). Because of the large number of weight memory accesses and high computation complexity with the cascade-dependent structure, it is a big challenge to efficiently implement the LSTM on field-programmable gate arrays (FPGAs). To speed up the inference on FPGA, considering its limited resource, a structured pruning method that can not only reduce the LSTM model's size without los… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(11 citation statements)
references
References 14 publications
0
8
0
2
Order By: Relevance
“…Although this method is effective in convolutional layers, it fails to work in the fully-connected layers, where removing one column can cause significant information loss as it is equivalent to removing one input activation. Prior work (Wen et al 2017;Wang et al 2019) adopts this strategy in RNNs but only achieves about 2× parameter reduction. Block pruning performs pruning at the scale of blocks (Van Keirsbilck, Keller, and Yang 2019), but grouping neighboring weights into a specific structure is a strong constraint which is not an effective way to keep the salient weights.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Although this method is effective in convolutional layers, it fails to work in the fully-connected layers, where removing one column can cause significant information loss as it is equivalent to removing one input activation. Prior work (Wen et al 2017;Wang et al 2019) adopts this strategy in RNNs but only achieves about 2× parameter reduction. Block pruning performs pruning at the scale of blocks (Van Keirsbilck, Keller, and Yang 2019), but grouping neighboring weights into a specific structure is a strong constraint which is not an effective way to keep the salient weights.…”
Section: Background and Related Workmentioning
confidence: 99%
“…To compress the model for the hardware, two widely applied methods are (a) model selection/structured pruning, i.e. choosing a model structure with pruned layers/channels and small performance degradation [4,5], and (b) zero-weight compression/sparse pruning, i.e. pruning small-value weights to zero [6,7].…”
Section: Introductionmentioning
confidence: 99%
“…pruning small-value weights to zero [6,7]. Model selection differs from sparse pruning in that it deletes entire channels or layers, showing a more efficient speedup during inference, yet with a more severe performance degradation [4,5]. These two types of methods are usually complementary: after being structurally pruned, a model can also undergo further zero-weight compression to improve the inference speed.…”
Section: Introductionmentioning
confidence: 99%
“…Sendo que mais recentemente alguns trabalhos indicam a adequação do uso de redes neurais do tipo LSTM (Long Short-Term Memory) devido à sua bem conhecida capa-cidade de processar dados sequenciais, como são as séries temporais das variáveis de processos industrias (de Oliveira et al, 2020;Jalayer et al, 2021). Porém, tais tipos de redes neurais demandam mais recursos computacionais de processamento e memória devido a sua estrutura em cascata dependente que causam gargalos para se realizar a operação de inferência (Wang et al, 2019;Gao et al, 2020). Além disto, em muitos sistemas industrias, como os baseados em IIoT (Industrial Internet of Things), tais recursos computacionais são bastante limitados, o que implica no emprego de técnicas baseadas em redes neurais de forma mais eficiente em termos computacionais, sem prejudicar seus desempenhos, para que sua adoção seja viável.…”
Section: Introductionunclassified
“…A aplicação de técnicas de compressão nas redes neurais LSTM se faz necessário devido à alta parametrização deste tipo de rede, que podem facilmente chegar a milhões de parâmetros (Kadetotad et al, 2020). A compressão destes modelos implica na redução de memória ocupada pelos mesmos (Wang et al, 2019) e no processamento da sua inferência.…”
Section: Introductionunclassified