2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461743
|View full text |Cite
|
Sign up to set email alerts
|

Prediction of LSTM-RNN Full Context States as a Subtask for N-Gram Feedforward Language Models

Abstract: Long short-term memory (LSTM) recurrent neural network language models compress the full context of variable lengths into a fixed size vector. In this work, we investigate the task of predicting the LSTM hidden representation of the full context from a truncated n-gram context as a subtask for training an n-gram feedforward language model. Since this approach is a form of knowledge distillation, we compare two methods. First, we investigate the standard transfer based on the Kullback-Leibler divergence of the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(2 citation statements)
references
References 19 publications
0
1
1
Order By: Relevance
“…In addition, we examine the type of activation function ( Table 4). As opposed to previous work on feed-forward language models using GLUs [28,34], we do not observe faster convergence. As we observe that the impact of choice of activation functions on the perplexity is overall limited, all our other models use the standard ReLU.…”
Section: Hyper-parameter Tuningcontrasting
confidence: 99%
“…In addition, we examine the type of activation function ( Table 4). As opposed to previous work on feed-forward language models using GLUs [28,34], we do not observe faster convergence. As we observe that the impact of choice of activation functions on the perplexity is overall limited, all our other models use the standard ReLU.…”
Section: Hyper-parameter Tuningcontrasting
confidence: 99%
“…When the information flow enters the algorithm network, the redundant information that does not meet the algorithm rules is placed in the forget gate and removed (Li et al, 2021c). In practice, the LSTM-RNN algorithm can be expressed by Eqs 9-13 (Irie et al, 2018):…”
Section: ) Lstm-rnnmentioning
confidence: 99%