Proceedings of the Third Conference on Machine Translation: Shared Task Papers 2018
DOI: 10.18653/v1/w18-6451
|View full text |Cite
|
Sign up to set email alerts
|

Findings of the WMT 2018 Shared Task on Quality Estimation

Abstract: We report the results of the WMT18 shared task on Quality Estimation, i.e. the task of predicting the quality of the output of machine translation systems at various granularity levels: word, phrase, sentence and document. This year we include four language pairs, three text domains, and translations produced by both statistical and neural machine translation systems. Participating teams from ten institutions submitted a variety of systems to different task variants and language pairs.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
83
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 78 publications
(85 citation statements)
references
References 38 publications
2
83
0
Order By: Relevance
“…The scores of their model correlate well with Pyramid and Responsiveness, but text quality is only addressed indirectly. 2 Quality Estimation is well established in MT (Callison-Burch et al, 2012;Bojar et al, 2016Bojar et al, , 2017Martins et al, 2017;Specia et al, 2018). QE methods provide a quality indicator for translation output at run-time without relying on human references, typically needed by MT evaluation metrics (Papineni et al, 2002;Denkowski and Lavie, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…The scores of their model correlate well with Pyramid and Responsiveness, but text quality is only addressed indirectly. 2 Quality Estimation is well established in MT (Callison-Burch et al, 2012;Bojar et al, 2016Bojar et al, , 2017Martins et al, 2017;Specia et al, 2018). QE methods provide a quality indicator for translation output at run-time without relying on human references, typically needed by MT evaluation metrics (Papineni et al, 2002;Denkowski and Lavie, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…• SRC → PE: trained first on the in-domain corpus provided, then fine-tuned on the shared task data. deepQUEST is the open source system developed by Ive et al (2018), UNQE is the unpublished system from Jiangxi Normal University, described by Specia et al (2018a), and QE Brain is the system from Alibaba described by Wang et al (2018). Reported numbers for the OpenKiwi system correspond to best models in the development set: the STACKED model for prediction of MT tags, and the ENSEMBLED model for the rest.…”
Section: Benchmark Experimentsmentioning
confidence: 99%
“…F1-BAD F1-OK F1-multi F1-BAD F1-OK F1-multi F1-BAD F1-OK F1-multi Specia et al (2018). MT: machine translation, e.g., target sentence, SRC: source sentence.…”
Section: Gaps In Mt Words In Src Modelmentioning
confidence: 99%