Exploring Unsupervised Pretraining Objectives for Machine Translation

Baziotis, Christos; Titov, Ivan; Birch, Alexandra; Haddow, Barry

doi:10.18653/v1/2021.findings-acl.261

Cited by 3 publications

(5 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By contrast, when we add word replacements in the encoder, it greatly reduces Lit-TER in both splits. This aligns with the findings of Baziotis et al (2021), who show that source-side word replacements make the decoder less prone to copying (or "trusting") the encoder.…”

Section: Targeted Evaluationsupporting

confidence: 89%

“…Masking yields no effect on the zero split, but increases errors on the joint split. Baziotis et al (2021), show that masking promotes copying, which we speculate it could lead to word-by-word translation and increase Lit-TER. Decoder-side word replacements yield a similar behaviour in terms of LitTER, which we hypothesize could push the decoder to rely more on the encoder, therefore encouraging word-by-word translation.…”

Section: Targeted Evaluationmentioning

confidence: 75%

“…We also consider injecting different types of noise during fine-tuning, to corrupt the (encoder or decoder) input context and measure the effects on the targeted evaluation metrics. Specifically, we use source-side word masking and replacement (Baziotis et al, 2021), and target-side word-replacement noise (Voita et al, 2021). In our experiments, "random" denotes a randomly initialized model, while "mBART" stands for using mBART as initialization.…”

Section: Modelsmentioning

confidence: 99%

See 2 more Smart Citations

Automatic Evaluation and Analysis of Idioms in Neural Machine Translation

Baziotis,

Mathur,

Hasler

2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

A major open problem in neural machine translation (NMT) is the translation of idiomatic expressions, such as "under the weather". The meaning of these expressions is not composed by the meaning of their constituent words, and NMT models tend to translate them literally (i.e., word-by-word), which leads to confusing and nonsensical translations. Research on idioms in NMT is limited and obstructed by the absence of automatic methods for quantifying these errors. In this work, first, we propose a novel metric for automatically measuring the frequency of literal translation errors without human involvement. Equipped with this metric, we present controlled translation experiments with models trained in different conditions (with/without the test-set idioms) and across a wide range of (global and targeted) metrics and test sets. We explore the role of monolingual pretraining and find that it yields substantial targeted improvements, even without observing any translation examples of the test-set idioms. In our analysis, we probe the role of idiom context. We find that the randomly initialized models are more local or "myopic" as they are relatively unaffected by variations of the idiom context, unlike the pretrained ones.

show abstract

Section: Targeted Evaluationsupporting

confidence: 89%

Section: Targeted Evaluationmentioning

confidence: 75%

Section: Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Evaluation and Analysis of Idioms in Neural Machine Translation

Baziotis,

Mathur,

Hasler

2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…We refer to this loss as encoder-based MLM loss (eMLM; Baziotis et al 2021). It trains the encoder to reconstruct input text representations while attending to multimodal information.…”

Section: Self-supervised Auxiliary Guidancementioning

confidence: 99%

Hierarchical3D Adapters for Long Video-to-text Summarization

Papalampidi,

Lapata

2023

Findings of the Association for Computational Linguistics: EACL 2023

View full text Add to dashboard Cite

In this paper, we focus on video-to-text summarization and investigate how to best utilize multimodal information for summarizing long inputs (e.g., an hour-long TV show) into long outputs (e.g., a multi-sentence summary). We extend SummScreen (Chen et al., 2022), a dialogue summarization dataset consisting of transcripts of TV episodes with reference summaries, and create a multimodal variant by collecting corresponding full-length videos. We incorporate multimodal information into a pretrained textual summarizer efficiently using adapter modules augmented with a hierarchical structure while tuning only 3.8% of model parameters. Our experiments demonstrate that multimodal adapters outperform more memoryheavy and fully fine-tuned textual summarization methods.

show abstract

“…Ref. [29] explored many unsupervised pre-training objectives and systematically analyzed them in both supervised and unsupervised settings.…”

Section: Unsupervised Pre-trainingmentioning

confidence: 99%

Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation

Choi

2021

Applied Sciences

View full text Add to dashboard Cite

This work uses sequence-to-sequence (seq2seq) models pre-trained on monolingual corpora for machine translation. We pre-train two seq2seq models with monolingual corpora for the source and target languages, then combine the encoder of the source language model and the decoder of the target language model, i.e., the cross-connection. We add an intermediate layer between the pre-trained encoder and the decoder to help the mapping of each other since the modules are pre-trained completely independently. These monolingual pre-trained models can work as a multilingual pre-trained model because one model can be cross-connected with another model pre-trained on any other language, while their capacity is not affected by the number of languages. We will demonstrate that our method improves the translation performance significantly over the random baseline. Moreover, we will analyze the appropriate choice of the intermediate layer, the importance of each part of a pre-trained model, and the performance change along with the size of the bitext.

show abstract

Exploring Unsupervised Pretraining Objectives for Machine Translation

Cited by 3 publications

References 31 publications

Automatic Evaluation and Analysis of Idioms in Neural Machine Translation

Automatic Evaluation and Analysis of Idioms in Neural Machine Translation

Hierarchical3D Adapters for Long Video-to-text Summarization

Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation

Contact Info

Product

Resources

About