On the use of BERT for Neural Machine Translation

Clinchant, Stéphane; Jung, Kweon Woo; Nikoulina, Vassilina

doi:10.48550/arxiv.1909.12744

Cited by 7 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the birth of BERT (Devlin et al, 2019), there has been continuing advancement in language model pre-training, such as XLNet , RoBERTa , ALBERT (Lan et al, 2019), UniLM , and T5 (Raffel et al, 2019), which epitomizes the superb power of large-scale pre-training. Satellited around BERT, there are also studies on model compression (Sun et al, 2019c;Jiao et al, 2019;Shen et al, 2019) and extension from understanding to generation (Chen et al, 2019a;Clinchant et al, 2019;Wang and Cho, 2019).…”

Section: Model Pre-trainingmentioning

confidence: 99%

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

Fu¹,

Chen²,

Cheng³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

310

308

View full text Add to dashboard Cite

We present HERO, a novel framework for large-scale video+language omnirepresentation learning. HERO encodes multimodal inputs in a hierarchical structure, where local context of a video frame is captured by a Cross-modal Transformer via multimodal fusion, and global video context is captured by a Temporal Transformer. In addition to standard Masked Language Modeling (MLM) and Masked Frame Modeling (MFM) objectives, we design two new pre-training tasks: (i) Video-Subtitle Matching (VSM), where the model predicts both global and local temporal alignment; and (ii) Frame Order Modeling (FOM), where the model predicts the right order of shuffled video frames. HERO is jointly trained on HowTo100M and large-scale TV datasets to gain deep understanding of complex social dynamics with multi-character interactions. Comprehensive experiments demonstrate that HERO achieves new state of the art on multiple benchmarks over Text-based Video/Video-moment Retrieval, Video Question Answering (QA), Video-and-language Inference and Video Captioning tasks across different domains. We also introduce two new challenging benchmarks How2QA and How2R for Video QA and Retrieval, collected from diverse video content over multimodalities. 1

show abstract

Section: Model Pre-trainingmentioning

confidence: 99%

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

Fu¹,

Chen²,

Cheng³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

310

308

View full text Add to dashboard Cite

show abstract

“…Enhancing NMT systems with knowledge is a promising research direction in recent years. For example, in Lu et al (2018), , , knowledge graph is incorporate into machine translation task, and in Zhu et al (2020), Clinchant et al (2019), Yang et al (2020), Shavarani and Sarkar (2021), pre-trained language models are used to enhance the translation. In this work, named entity information is leveraged to boost the performance.…”

Section: Related Workmentioning

confidence: 99%

End-to-end entity-aware neural machine translation

Xie

Xia

et al. 2022

Mach Learn

View full text Add to dashboard Cite

Accurate translation of entities (e.g., person names, organizations, geography) is important in neural machine translation (briefly, NMT), as they are usually more difficult to translate than other words, and an incorrect translation of them will greatly hurt user experiences. In previous works, entities are either treated in the same way as other words, which leads to inaccurate translation, or handled by multiple steps (including named entity recognition, translation, and replacing entities back), which significantly increase the inference latency. In this work, we propose an end-to-end algorithm that carefully handles the translation of entities. There are mainly two novel parts compared to conventional NMT model: (1) The encoder and the decoder are attached with entity classifiers, which are used to verify whether the input token is a named entity. In this way, the encoder and decoder are capable to treat named entities differently; (2) The translation loss of each target token is adaptively increased by the probability that the target token is a named entity, which results in more accurate translation of entities. During inference time, these two parts will be removed so that the translation model maintains the same inference speed as conventional NMT models. Empirical results on six translation tasks demonstrate the effectiveness of our methods of improving the translation quality. Specifically, we improve 1.7 BLEU scores on Japanese to English translation and 4.6 entity F 1 scores on English to Chinese translation, without additional inference cost.

show abstract

“…A successful NLP model must understand the structure and context of language, learned via supervised or unsupervised methods. Pretrained language models have been used to boost performance in other NLP tasks (Clinchant et al, 2019;Zhu et al, 2020), such as BERT (Devlin et al, 2018) achieving state-of-the-art performance. Zhu et al, 2020 tried to fuse the embedding of BERT into a traditional transformer architecture using attention, increasing the translation performance by approximately 2 BLEU score.…”

Section: Related Workmentioning

confidence: 99%

Changing the Representation: Examining Language Representation for Neural Sign Language Production

Walsh¹,

Saunders²,

Bowden³

2022

Preprint

View full text Add to dashboard Cite

Neural Sign Language Production (SLP) aims to automatically translate from spoken language sentences to sign language videos. Historically the SLP task has been broken into two steps; Firstly, translating from a spoken language sentence to a gloss sequence and secondly, producing a sign language video given a sequence of glosses. In this paper we apply Natural Language Processing techniques to the first step of the SLP pipeline. We use language models such as BERT and Word2Vec to create better sentence level embeddings, and apply several tokenization techniques, demonstrating how these improve performance on the low resource translation task of Text to Gloss. We introduce Text to HamNoSys (T2H) translation, and show the advantages of using a phonetic representation for sign language translation rather than a sign level gloss representation. Furthermore, we use HamNoSys to extract the hand shape of a sign and use this as additional supervision during training, further increasing the performance on T2H. Assembling best practise, we achieve a BLEU-4 score of 26.99 on the MineDGS dataset and 25.09 on PHOENIX14T, two new state-of-the-art baselines.

show abstract

On the use of BERT for Neural Machine Translation

Cited by 7 publications

References 0 publications

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

End-to-end entity-aware neural machine translation

Changing the Representation: Examining Language Representation for Neural Sign Language Production

Contact Info

Product

Resources

About