Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1489
|View full text |Cite
|
Sign up to set email alerts
|

Direct Output Connection for a High-Rank Language Model

Abstract: This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers. Our proposed method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). The proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 31 publications
(24 citation statements)
references
References 25 publications
0
24
0
Order By: Relevance
“…However, we should note that none of these models perform at the same level of the state-ofthe-art models such as those of and Takase et al (2018) as we can see in Tables 1 and 2. These models use advanced regularization techniques and matrix factorization for training the RNN-LMs whilst our Averaging RNN-LM use standard LSTM trainig regime and regular-ization techniques.…”
Section: Resultsmentioning
confidence: 78%
“…However, we should note that none of these models perform at the same level of the state-ofthe-art models such as those of and Takase et al (2018) as we can see in Tables 1 and 2. These models use advanced regularization techniques and matrix factorization for training the RNN-LMs whilst our Averaging RNN-LM use standard LSTM trainig regime and regular-ization techniques.…”
Section: Resultsmentioning
confidence: 78%
“…The first is to learn better expressive word embeddings (Gao et al, 2019;Gong et al, 2018;. The second is to design better expressive output/activation functions (Yang et al, 2018;Ganea et al, 2019;Kanai et al, 2018;Takase et al, 2018). Nonetheless, we want to clarify that only focusing on the embedding/output layers is far more insufficient for language modeling, since it is the middle layers that provide the major non-linearity which matters most for the expressiveness.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…We also consider that the softmax bottleneck problem (Yang et al, 2018) is highly related to the representation degeneration problem. There are a series of works (Ganea et al, 2019;Kanai et al, 2018;Takase et al, 2018) that follow this line of research.…”
Section: Related Workmentioning
confidence: 94%
“…We incorporated this method with a widely used LSTM encoderdecoder model (Luong et al, 2015) 4 . For a fair comparison, we set the same hyper-parameters as in Takase et al (2018) because they indicated that the LSTM encoder-decoder model trained with the hyper-parameters achieved a similar performance to the state-of-the-art on the headline generation.…”
Section: Baselinesmentioning
confidence: 99%