A Convolutional Encoder Model for Neural Machine Translation

Gehring, Jonas; Auli, Michael; Grangier, David; Dauphin, Yann N.

doi:10.48550/arxiv.1611.02344

Cited by 41 publications

(50 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CNNs are good at processing data that has a grid-like topology. Two-dimensional CNNs achieve great success in computer vision [29,30,31,32], while one-dimensional CNNs are commonly used for sequential data [33,34,35]. Among these models, TCNs which use causal convolutions with skewed connections attempt to capture the temporal interactions and have been applied to various regression tasks, such as action segmentation and detection [36,37], lip-reading [38,39], and ENSO prediction [40].…”

Section: Related Workmentioning

confidence: 99%

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Liu¹,

Khalitov²,

Tao³

et al. 2022

Preprint

View full text Add to dashboard Cite

Classification of long sequential data is an important Machine Learning task and appears in many application scenarios. Recurrent Neural Networks, Transformers, and Convolutional Neural Networks are three major techniques for learning from sequential data. Among these methods, Temporal Convolutional Networks (TCNs) which are scalable to very long sequences have achieved remarkable progress in time series regression. However, the performance of TCNs for sequence classification is not satisfactory because they use a skewed connection protocol and output classes at the last position. Such asymmetry restricts their performance for classification which depends on the whole sequence. In this work, we propose a symmetric multi-scale architecture called Circular Dilated Convolutional Neural Network (CDIL-CNN), where every position has an equal chance to receive information from other positions at the previous layers. Our model gives classification logits in all positions, and we can apply a simple ensemble learning to achieve a better decision. We have tested CDIL-CNN on various long sequential datasets. The experimental results show that our method has superior performance over many state-of-the-art approaches. The model and experiments are available at https://github.com/LeiCheng-no/CDIL-CNN.

show abstract

Section: Related Workmentioning

confidence: 99%

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Liu¹,

Khalitov²,

Tao³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…At prediction time, either greedy or beam search is used to generate the target sentence from left to right. Various architectures have been proposed to improve the quality of neural machine translation.This involves recurrent networks [3], convolutional networks [9] and transformer networks [28]. Attention has shown great help for these neural architectures, which includes self-attention [26], multi-hop attention [12] and multi-head attention [2].…”

Section: Related Workmentioning

confidence: 99%

Fine Grained Human Evaluation for English-to-Chinese Machine Translation: A Case Study on Scientific Text

Liu¹,

Zhang²,

Wu³

2021

Preprint

View full text Add to dashboard Cite

Recent research suggests that neural machine translation (MT) in the news domain has reached human-level performance, but for other professional domains, it is far below the level. In this paper, we conduct a fine-grained systematic human evaluation for four widely used Chinese-English NMT systems on scientific abstracts which are collected from published journals and books. Our human evaluation results show that all the systems return with more than 10% error rates on average, which requires much post editing effort for real academic use. Furthermore, we categorize six main error types and and provide some real examples. Our findings emphasise the needs that research attention in the MT community should be shifted from short text generic translation to professional machine translation and build large scale bilingual corpus for these specific domains.

show abstract

“…Unlike recurrent networks, CNN enables parallelization and faster processing. Encoder-decoder models using CNN were proved effective in translating phrases in the source sentence to suitable target sentences [6,7]. CNN based NMT models could not, however, match the performance of the state of the art in recurrent neural network based NMT models [3].…”

Section: Introductionmentioning

confidence: 99%

Context- and sequence-aware convolutional recurrent encoder for neural machine translation

Mallick

Susan

Agrawal

et al. 2021

Proceedings of the 36th Annual ACM Symposium on Applied Computing

View full text Add to dashboard Cite

Neural Machine Translation model is a sequence-to-sequence converter based on neural networks. Existing models use recurrent neural networks to construct both the encoder and decoder modules. In alternative research, the recurrent networks were substituted by convolutional neural networks for capturing the syntactic structure in the input sentence and decreasing the processing time. We incorporate the goodness of both approaches by proposing a convolutional-recurrent encoder for capturing the context information as well as the sequential information from the source sentence. Word embedding and position embedding of the source sentence is performed prior to the convolutional encoding layer which is basically a n-gram feature extractor capturing phrase-level context information. The rectified output of the convolutional encoding layer is added to the original embedding vector, and the sum is normalized by layer normalization. The normalized output is given as a sequential input to the recurrent encoding layer that captures the temporal information in the sequence. For the decoder, we use the attention-based recurrent neural network. Translation task on the German-English dataset verifies the efficacy of the proposed approach from the higher BLEU scores achieved as compared to the state of the art.

show abstract

A Convolutional Encoder Model for Neural Machine Translation

Cited by 41 publications

References 0 publications

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Fine Grained Human Evaluation for English-to-Chinese Machine Translation: A Case Study on Scientific Text

Context- and sequence-aware convolutional recurrent encoder for neural machine translation

Contact Info

Product

Resources

About