Proceedings of the Third Conference on Machine Translation: Shared Task Papers 2018
DOI: 10.18653/v1/w18-6414
|View full text |Cite
|
Sign up to set email alerts
|

The MLLP-UPV German-English Machine Translation System for WMT18

Abstract: This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German→English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture-based neural machine translation systems. To train our system under "constrained" conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 10 publications
0
7
0
Order By: Relevance
“…During this competition, 16 different teams from universities belonging to the autonomous regions of Valencia, Catalonia and the Balearic islands debated in the Catalan language on the topic "Should surrogacy be legalised?" In addition to the original language of the annotated data, automatic translations to Spanish and English languages using the MLLP machine translation toolkit [30,31] have also been included. The results and the evaluation of the debates were directly retrieved from the organisation, but were post-processed by us in order to focus on the argumentative aspects of the debates and to preserve the anonymity of the jury and the participant teams.…”
Section: Data Collectionmentioning
confidence: 99%
“…During this competition, 16 different teams from universities belonging to the autonomous regions of Valencia, Catalonia and the Balearic islands debated in the Catalan language on the topic "Should surrogacy be legalised?" In addition to the original language of the annotated data, automatic translations to Spanish and English languages using the MLLP machine translation toolkit [30,31] have also been included. The results and the evaluation of the debates were directly retrieved from the organisation, but were post-processed by us in order to focus on the argumentative aspects of the debates and to preserve the anonymity of the jury and the participant teams.…”
Section: Data Collectionmentioning
confidence: 99%
“…This is highlighted by the fact that a majority of participating systems in the WMT18 News Translation Task apply filtering techniques to ParaCrawl. Additionally, the experiments carried out for our 2018 submission (Iranzo-Sánchez et al, 2018) show that using a noisy corpus such as ParaCrawl without filtering can result in a worse performance compared with a baseline system that simply excludes the noisy corpus from the training data.…”
Section: Corpus Filteringmentioning
confidence: 99%
“…• LM-based filtering (Iranzo-Sánchez et al, 2018): This approach uses language models for estimating the quality of a sentence pair, under the assumption that a low-perplexity sentence is more likely to be an adequate sentence for training. Using in-domain data, we train one language model for each language, and then use them to score the corresponding side of the sentence pair, giving us perplexity scores (s, t).…”
Section: Corpus Filteringmentioning
confidence: 99%
See 1 more Smart Citation
“…The same filtering was applied to the monolingual Kazakh Common Crawl corpus. In addition, inspired by Iranzo-Sánchez et al (2018), we ranked its sentences by perplexity computed by a character-based 7-gram language model and discarded the half of the corpus with the highest perplexity. The language model was trained 2 on the high-quality Kazakh monolingual News Commentary corpus.…”
Section: Data Preparation and Training Detailsmentioning
confidence: 99%