Direct Output Connection for a High-Rank Language Model

Takase, Sho; Suzuki, Jun; Nagata, Michio

doi:10.18653/v1/d18-1489

Cited by 31 publications

(24 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, we should note that none of these models perform at the same level of the state-ofthe-art models such as those of and Takase et al (2018) as we can see in Tables 1 and 2. These models use advanced regularization techniques and matrix factorization for training the RNN-LMs whilst our Averaging RNN-LM use standard LSTM trainig regime and regular-ization techniques.…”

Section: Resultsmentioning

confidence: 78%

Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists

Salton¹,

Kelleher²

2019

Proceedings - Natural Language Processing in a Deep Learning World

View full text Add to dashboard Cite

Language Models (LMs) are important components in several Natural Language Processing systems. Recurrent Neural Network LMs composed of LSTM units, especially those augmented with an external memory, have achieved state-of-the-art results. However, these models still struggle to process long sequences which are more likely to contain longdistance dependencies because of information fading and a bias towards more recent information. In this paper we demonstrate an effective mechanism for retrieving information in a memory augmented LSTM LM based on attending to information in memory in proportion to the number of timesteps the LSTM gating mechanism persisted the information.

show abstract

Section: Resultsmentioning

confidence: 78%

Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists

Salton¹,

Kelleher²

2019

Proceedings - Natural Language Processing in a Deep Learning World

View full text Add to dashboard Cite

show abstract

“…The first is to learn better expressive word embeddings (Gao et al, 2019;Gong et al, 2018;. The second is to design better expressive output/activation functions (Yang et al, 2018;Ganea et al, 2019;Kanai et al, 2018;Takase et al, 2018). Nonetheless, we want to clarify that only focusing on the embedding/output layers is far more insufficient for language modeling, since it is the middle layers that provide the major non-linearity which matters most for the expressiveness.…”

Section: Discussion and Future Workmentioning

confidence: 99%

“…We also consider that the softmax bottleneck problem (Yang et al, 2018) is highly related to the representation degeneration problem. There are a series of works (Ganea et al, 2019;Kanai et al, 2018;Takase et al, 2018) that follow this line of research.…”

Section: Related Workmentioning

confidence: 94%

Revisiting Representation Degeneration Problem in Language Modeling

Zhang

Gao

Xu³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Weight tying is now a common setting in many language generation tasks such as language modeling and machine translation. However, a recent study reveals that there is a potential flaw in weight tying. They find that the learned word embeddings are likely to degenerate and lie in a narrow cone when training a language model. They call it the representation degeneration problem and propose a cosine regularization to solve it. Nevertheless, we prove that the cosine regularization is insufficient to solve the problem, as the degeneration is still likely to happen under certain conditions. In this paper, we revisit the representation degeneration problem and theoretically analyze the limitations of the previously proposed solution. Afterward, we propose an alternative regularization method called Laplacian regularization to tackle the problem. Experiments on language modeling demonstrate the effectiveness of the proposed Laplacian regularization.

show abstract

“…We incorporated this method with a widely used LSTM encoderdecoder model (Luong et al, 2015) 4 . For a fair comparison, we set the same hyper-parameters as in Takase et al (2018) because they indicated that the LSTM encoder-decoder model trained with the hyper-parameters achieved a similar performance to the state-of-the-art on the headline generation.…”

Section: Baselinesmentioning

confidence: 99%

Positional Encoding to Control Output Sequence Length

Takase¹,

Okazaki²

2019

Proceedings of the 2019 Conference of the North

Self Cite

View full text Add to dashboard Cite

Neural encoder-decoder models have been successful in natural language generation tasks. However, real applications of abstractive summarization must consider additional constraint that a generated summary should not exceed a desired length. In this paper, we propose a simple but effective extension of a sinusoidal positional encoding (Vaswani et al., 2017) to enable neural encoder-decoder model to preserves the length constraint. Unlike in previous studies where that learn embeddings representing each length, the proposed method can generate a text of any length even if the target length is not present in training data. The experimental results show that the proposed method can not only control the generation length but also improve the ROUGE scores.

show abstract

Direct Output Connection for a High-Rank Language Model

Cited by 31 publications

References 25 publications

Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists

Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists

Revisiting Representation Degeneration Problem in Language Modeling

Positional Encoding to Control Output Sequence Length

Contact Info

Product

Resources

About