YiSi - a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources

Lo, Chi-kiu

doi:10.18653/v1/w19-5358

Cited by 85 publications

(96 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…9 Tables 2 and 3 compare our multi-source system to the other official submissions in the top 5 of the WMT19 competition. In automatic evaluation by BLEU, we were tied for third place, although with a slight edge when measured by YiSi-1 (Lo, 2019); in human evaluation, we were in a statistical tie for second place. Notably, our multi-source system was the top non-ensemble pure NMT system, with other higher-scoring systems either being ensembles or SMT/NMT hybrids.…”

Section: Resultsmentioning

confidence: 99%

Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation

Littell¹,

Lo²,

Larkin³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Resultsmentioning

confidence: 99%

Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation

Littell¹,

Lo²,

Larkin³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

Self Cite

View full text Add to dashboard Cite

show abstract

“…(Stanchev et al, 2019) http://github.com/rwth-i6/ExtendedEditDistance ESIM learned neural representations yes • ⊘ Univ. of Melbourne (Mathur et al, 2019) http://github.com/nitikam/mteval-in-context LEPORa surface linguistic features • ⊘ Dublin City University, ADAPT (Han et al, 2012(Han et al, , 2013 http://github.com/poethan/LEPOR LEPORb surface linguistic features • ⊘ Dublin City University, ADAPT (Han et al, 2012(Han et al, , 2013 http://github.com/poethan/LEPOR (Lo, 2019) http://github.com/chikiulo/YiSi Table 2: Participants of WMT19 Metrics Shared Task. "•" denotes that the metric took part in (some of the language pairs) of the segment-and/or system-level evaluation.…”

Section: Baseline Metricsmentioning

confidence: 99%

“…YiSi (Lo, 2019) is a unified semantic MT quality evaluation and estimation metric for languages with different levels of available resources.…”

Section: Yisi-0 Yisi-1 Yisi-1_srl Yisi-2 Yisi-2_srlmentioning

confidence: 99%

Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges

Ma¹,

Wei²,

Bojar³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

117

133

View full text Add to dashboard Cite

This paper presents the results of the WMT19 Metrics Shared Task. Participants were asked to score the outputs of the translations systems competing in the WMT19 News Translation Task with automatic metrics. 13 research groups submitted 24 metrics, 10 of which are reference-less "metrics" and constitute submissions to the joint task with WMT19 Quality Estimation Task, "QE as a Metric". In addition, we computed 11 baseline metrics, with 8 commonly applied baselines (BLEU, SentBLEU, NIST, WER, PER, TER, CDER, and chrF) and 3 reimplementations (chrF+, sacreBLEU-BLEU, and sacreBLEU-chrF). Metrics were evaluated on the system level, how well a given metric correlates with the WMT19 official manual ranking, and segment level, how well the metric correlates with human judgements of segment quality. This year, we use direct assessment (DA) as our only form of manual evaluation.

show abstract

“…Later metrics matched words in the two texts using their word embeddings (Lo, 2017;Clark et al, 2019). More recently, contextual similarity measures were devised for this purpose (Lo, 2019;Wieting et al, 2019;Zhao et al, 2019;Zhang et al, 2020;Sellam et al, 2020). In §7 we provide a qualitative analysis for the latter, presenting typical evaluation mistakes made by a recently-proposed contextual-similarity based metric (Zhang et al, 2020).…”

Section: Generation Evaluationmentioning

confidence: 99%

Semantically Driven Sentence Fusion: Modeling and Evaluation

Ben-David

Keller

Malmi

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Sentence fusion is the task of joining related sentences into coherent text. Current training and evaluation schemes for this task are based on single reference ground-truths and do not account for valid fusion variants. We show that this hinders models from robustly capturing the semantic relationship between input sentences. To alleviate this, we present an approach in which ground-truth solutions are automatically expanded into multiple references via curated equivalence classes of connective phrases. We apply this method to a large-scale dataset and use the augmented dataset for both model training and evaluation. To improve the learning of semantic representation using multiple references, we enrich the model with auxiliary discourse classification tasks under a multi-tasking framework. Our experiments highlight the improvements of our approach over state-of-the-art models. 1

show abstract

YiSi - a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources

Cited by 85 publications

References 14 publications

Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation

Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation

Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges

Semantically Driven Sentence Fusion: Modeling and Evaluation

Contact Info

Product

Resources

About