Findings of the WMT 2018 Shared Task on Quality Estimation

Specia, Lucia; Blain, Frédéric; Logacheva, Varvara; Astudillo, Ramón Fernandez; Martins, André F. T.

doi:10.18653/v1/w18-6451

Cited by 78 publications

(85 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The scores of their model correlate well with Pyramid and Responsiveness, but text quality is only addressed indirectly. 2 Quality Estimation is well established in MT (Callison-Burch et al, 2012;Bojar et al, 2016Bojar et al, , 2017Martins et al, 2017;Specia et al, 2018). QE methods provide a quality indicator for translation output at run-time without relying on human references, typically needed by MT evaluation metrics (Papineni et al, 2002;Denkowski and Lavie, 2014).…”

Section: Related Workmentioning

confidence: 99%

SUM-QE: a BERT-based Summary Quality Estimation Model

Xenouleas¹,

Malakasiotis

Apidianaki

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

We propose SUM-QE, a novel Quality Estimation model for summarization based on BERT. The model addresses linguistic quality aspects that are only indirectly captured by content-based approaches to summary evaluation, without involving comparison with human references. SUM-QE achieves very high correlations with human ratings, outperforming simpler models addressing these linguistic aspects. Predictions of the SUM-QE model can be used for system development, and to inform users of the quality of automatically produced summaries and other types of generated text.

show abstract

Section: Related Workmentioning

confidence: 99%

SUM-QE: a BERT-based Summary Quality Estimation Model

Xenouleas¹,

Malakasiotis

Apidianaki

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…• SRC → PE: trained first on the in-domain corpus provided, then fine-tuned on the shared task data. deepQUEST is the open source system developed by Ive et al (2018), UNQE is the unpublished system from Jiangxi Normal University, described by Specia et al (2018a), and QE Brain is the system from Alibaba described by Wang et al (2018). Reported numbers for the OpenKiwi system correspond to best models in the development set: the STACKED model for prediction of MT tags, and the ENSEMBLED model for the rest.…”

Section: Benchmark Experimentsmentioning

confidence: 99%

OpenKiwi: An Open Source Framework for Quality Estimation

Kepler

Trénous²,

Treviso

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Self Cite

View full text Add to dashboard Cite

We introduce OpenKiwi, a PyTorch-based open source framework for translation quality estimation. OpenKiwi supports training and testing of word-level and sentence-level quality estimation systems, implementing the winning systems of the WMT 2015-18 quality estimation campaigns. We benchmark OpenKiwi on two datasets from WMT 2018 (English-German SMT and NMT), yielding state-of-the-art performance on the word-level tasks and near state-of-the-art in the sentencelevel tasks.

show abstract

“…F1-BAD F1-OK F1-multi F1-BAD F1-OK F1-multi F1-BAD F1-OK F1-multi Specia et al (2018). MT: machine translation, e.g., target sentence, SRC: source sentence.…”

Section: Gaps In Mt Words In Src Modelmentioning

confidence: 99%

Improving Pre-Trained Multilingual Model with Vocabulary Expansion

Wang

Sun

et al. 2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

Recently, pre-trained language models have achieved remarkable success in a broad range of natural language processing tasks. However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language. Instead of exhaustively pre-training monolingual language models independently, an alternative solution is to pre-train a powerful multilingual deep language model over large-scale corpora in hundreds of languages. However, the vocabulary size for each language in such a model is relatively small, especially for low-resource languages. This limitation inevitably hinders the performance of these multilingual models on tasks such as sequence labeling, wherein in-depth token-level or sentence-level understanding is essential.In this paper, inspired by previous methods designed for monolingual settings, we investigate two approaches (i.e., joint mapping and mixture mapping) based on a pre-trained multilingual model BERT for addressing the out-of-vocabulary (OOV) problem on a variety of tasks, including part-of-speech tagging, named entity recognition, machine translation quality estimation, and machine reading comprehension. Experimental results show that using mixture mapping is more promising. To the best of our knowledge, this is the first work that attempts to address and discuss the OOV issue in multilingual settings.

show abstract

Findings of the WMT 2018 Shared Task on Quality Estimation

Cited by 78 publications

References 38 publications

SUM-QE: a BERT-based Summary Quality Estimation Model

SUM-QE: a BERT-based Summary Quality Estimation Model

OpenKiwi: An Open Source Framework for Quality Estimation

Improving Pre-Trained Multilingual Model with Vocabulary Expansion

Contact Info

Product

Resources

About