A dual channel class hierarchy based recurrent language modeling

Shi, Libin; Rong, Wenge; Zhou, Shijie; Jiang, Nan; Zhang, Xiong

doi:10.1016/j.neucom.2020.07.112

Cited by 2 publications

(2 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the case of English, since the basic unit of English is the word, it is only necessary to split the word directly according to the space. However, since English sentences contain stop words, they also need to be deactivated during the word separation process [29].…”

Section: Split Word Processing As (Iwslt) 2019 Dataset Contains Chine...mentioning

confidence: 99%

LSTM-Based Attentional Embedding for English Machine Translation

Jian

Xiang

2022

Scientific Programming

View full text Add to dashboard Cite

In order to reduce the workload of manual grading and improve the efficiency of grading, a computerized intelligent grading system for English translation based on natural language processing is designed. An attention-embedded LSTM English machine translation model is proposed. Firstly, according to the characteristics of the standard LSTM network model that uses fixed dimensional vectors to represent words in the encoding stage, an English machine translation model based on LSTM attention embedding is established; the structure level of the English translation scoring system is constructed. A linguistic model of the English translation scoring system is established, and the probability distribution of a particular sentence sequence or word sequence of the translated text is statistically calculated using the model. The results show that the English machine translation model based on LSTM attention embedding proposed in this study can enhance the representation of the source language contextual information and improve the performance of the English machine translation model and the quality of the translation compared with the English machine translation models constructed by existing neural network structures, such as standard LSTM models, RNN models, and GRU-Attention translation models.

show abstract

Section: Split Word Processing As (Iwslt) 2019 Dataset Contains Chine...mentioning

confidence: 99%

LSTM-Based Attentional Embedding for English Machine Translation

Jian

Xiang

2022

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…Transformers have outperformed most of the Natural Language Processing tasks. However, language modelling requires Model Validation Test RNN-LDA + KN-5 + cache [44] -92.0 LSTM (large) [45] 82.2 78.4 Variational LSTM (large, MC) [46] -73.4 CharCNN [47] -78.9 Variational LSTM (tied) + augmented loss [48] 71.1 68.5 Variational RHN (tied) [49] 67.9 65.4 NAS Cell (tied) [50] -62.4 4-layer skip connection LSTM (tied) [51] 60.9 58.3 AWD-LSTM -3-layer LSTM (tied) + continuous cache pointer [28] 53.9 52.8 LSTM+ Dual Channel Class Hierarchy [52] -118.3 LSTM(Large) + cell [53] 76.15 73.87 AWD-FWM [54] 56…”

Section: Transformersmentioning

confidence: 99%

Language Modeling through Long-Term Memory Network

Nugaliyadde

Sohel

Wong

et al. 2019

2019 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Memory Networks which contain memory are popularly used to learn patterns in sequential data. Sequential data has long sequences that hold relationships. RNN can handle long sequences but suffers from the vanishing and exploding gradient problems. While LSTM and other memory networks address this problem, they are not capable of handling long sequences (50 or more data points long sequence patterns). Language modelling requiring learning from longer sequences are affected by the need for more information in memory. This paper introduces Long Term Memory network (LTM), which can tackle the exploding and vanishing gradient problems and handles long sequences without forgetting. LTM is designed to scale data in the memory and gives a higher weight to the input in the sequence. LTM avoid overfitting by scaling the cell state after achieving the optimal results. The LTM is tested on Penn treebank dataset, and Text8 dataset and LTM achieves test perplexities of 83 and 82 respectively. 650 LTM cells achieved a test perplexity of 67 for Penn treebank, and 600 cells achieved a test perplexity of 77 for Text8. LTM achieves state of the art results by only using ten hidden LTM cells for both datasets.

show abstract

A dual channel class hierarchy based recurrent language modeling

Cited by 2 publications

References 27 publications

LSTM-Based Attentional Embedding for English Machine Translation

LSTM-Based Attentional Embedding for English Machine Translation

Language Modeling through Long-Term Memory Network

Contact Info

Product

Resources

About