Proceedings of the Third Conference on Machine Translation: Shared Task Papers 2018
DOI: 10.18653/v1/w18-6404
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study of Machine Translation for the Shared Task of WMT18

Abstract: This paper describes the Global Tone Communication Co., Ltd.'s submission of the WMT18 shared news translation task. We participated in the English-to-Chinese direction and get the best BLEU (43.8) scores among all the participants. The submitted system focus on data clearing and techniques to build a competitive model for this task. Unlike other participants, the submitted system are mainly relied on the data filtering to obtain the best BLEU score. We do data filtering not only for provided sentences but als… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 6 publications
(4 reference statements)
0
7
0
Order By: Relevance
“…Due to the non-standard way of submission, the system is not considered a regular participant, but an invited/late submission and marked with " " throughout the paper. (Bei et al, 2018) GTCOM-PRIMARY is based on the Transformer "base" model architecture using Marian toolkit, and it also applies some methods that have been proven effective in NMT system, such as BPE, back-translation, right-to-left reranking and ensembling decoding. In this experiment, right-toleft reranking does not help.…”
Section: Alibaba (mentioning
confidence: 99%
“…Due to the non-standard way of submission, the system is not considered a regular participant, but an invited/late submission and marked with " " throughout the paper. (Bei et al, 2018) GTCOM-PRIMARY is based on the Transformer "base" model architecture using Marian toolkit, and it also applies some methods that have been proven effective in NMT system, such as BPE, back-translation, right-to-left reranking and ensembling decoding. In this experiment, right-toleft reranking does not help.…”
Section: Alibaba (mentioning
confidence: 99%
“…As a result, the MT field faces various data quality issues such as misalignment and incorrect translations, which may significantly impact translation quality . A straightforward solution is to apply a filtering approach, where noisy data are filtered out and a smaller subset of high-quality sentence pairs is retained (Bei et al, 2018;Junczys-Dowmunt, 2018;Rossenbach et al, 2018). Nevertheless, it is unclear whether such a filtering approach can be successfully applied to GEC, where commonly available datasets tend to be far smaller than those used in recent neural MT research.…”
Section: Related Workmentioning
confidence: 99%
“…We evaluated the effectiveness of our method over several GEC datasets, and found that it considerably outperformed baseline methods, includ-ing three strong denoising baselines based on a filtering approach, which is a common approach in MT (Bei et al, 2018;Junczys-Dowmunt, 2018;Rossenbach et al, 2018). We further improved the performance by applying task-specific techniques and achieved state-of-the-art performance on the CoNLL-2014, JFLEG, and BEA-2019 benchmarks.…”
Section: Introductionmentioning
confidence: 96%
“…The methods of data filtering by human rules are mainly the same as we did in English to Chinese (Bei et al, 2018) last year, but language models are used to clean all data, including monolingual data, parallel data and synthetic data. We use Marian to train the transformer language model for each language (i.e.…”
Section: Data Filteringmentioning
confidence: 99%