Ricardo Rei scite author profile

We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-theart levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metrics. Our models achieve new state-ofthe-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.

show abstract

IST-Unbabel 2021 Submission for the Explainable Quality Estimation Shared Task

Treviso¹,

Guerreiro²,

Rei³

et al. 2021

View full text Add to dashboard Cite

We present the joint contribution of Instituto Superior Técnico (IST) and Unbabel to the Explainable Quality Estimation (QE) shared task, where systems were submitted to two tracks: constrained (without word-level supervision) and unconstrained (with word-level supervision). For the constrained track, we experimented with several explainability methods to extract the relevance of input tokens from sentence-level QE models built on top of multilingual pre-trained transformers. Among the different tested methods, composing explanations in the form of attention weights scaled by the norm of value vectors yielded the best results. When word-level labels are used during training, our best results were obtained by using word-level predicted probabilities. We further improve the performance of our methods on the two tracks by ensembling explanation scores extracted from models trained with different pre-trained transformers, achieving strong results for in-domain and zero-shot language pairs.

show abstract

Uncertainty-Aware Machine Translation Evaluation

Glushkova¹,

Zerva²,

Rei³

et al. 2021

View full text Add to dashboard Cite

Several neural-based metrics have been recently proposed to evaluate machine translation quality. However, all of them resort to point estimates, which provide limited information at segment level. This is made worse as they are trained on noisy, biased and scarce human judgements, often resulting in unreliable quality predictions. In this paper, we introduce uncertainty-aware MT evaluation and analyze the trustworthiness of the predicted quality. We combine the COMET framework with two uncertainty estimation methods, Monte Carlo dropout and deep ensembles, to obtain quality scores along with confidence intervals. We compare the performance of our uncertaintyaware MT evaluation methods across multiple language pairs from the QT21 dataset and the WMT20 metrics task, augmented with MQM annotations. We experiment with varying numbers of references and further discuss the usefulness of uncertainty-aware quality estimation (without references) to flag possibly critical translation mistakes.1 Link to our code can be found at https://github. com/deep-spin/UA_COMET.

show abstract

Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort

Mendonça¹,

Rei²,

Coheur³

et al. 2021

View full text Add to dashboard Cite

In Machine Translation, assessing the quality of a large amount of automatic translations can be challenging. Automatic metrics are not reliable when it comes to high performing systems. In addition, resorting to human evaluators can be expensive, especially when evaluating multiple systems. To overcome the latter challenge, we propose a novel application of online learning that, given an ensemble of Machine Translation systems, dynamically converges to the best systems, by taking advantage of the human feedback available. Our experiments on WMT'19 datasets show that our online approach quickly converges to the top-3 ranked systems for the language pairs considered, despite the lack of human feedback for many translations.

show abstract

COMET: A Neural Framework for MT Evaluation

Rei¹,

Stewart²,

Farinha³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ricardo Rei

COMET: A Neural Framework for MT Evaluation

IST-Unbabel 2021 Submission for the Explainable Quality Estimation Shared Task

Uncertainty-Aware Machine Translation Evaluation

Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort

COMET: A Neural Framework for MT Evaluation

Contact Info

Product

Resources

About