Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1) 2019
DOI: 10.18653/v1/w19-5358
|View full text |Cite
|
Sign up to set email alerts
|

YiSi - a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources

Abstract: We present YiSi, a unified automatic semantic machine translation quality evaluation and estimation metric for languages with different levels of available resources. Underneath the interface with different language resources settings, YiSi uses the same representation for the two sentences in assessment. Besides, we show significant improvement in the correlation of YiSi-1's scores with human judgment is made by using contextual embeddings in multilingual BERT-Bidirectional Encoder Representations from Transf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
87
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 85 publications
(96 citation statements)
references
References 14 publications
1
87
0
Order By: Relevance
“…9 Tables 2 and 3 compare our multi-source system to the other official submissions in the top 5 of the WMT19 competition. In automatic evaluation by BLEU, we were tied for third place, although with a slight edge when measured by YiSi-1 (Lo, 2019); in human evaluation, we were in a statistical tie for second place. Notably, our multi-source system was the top non-ensemble pure NMT system, with other higher-scoring systems either being ensembles or SMT/NMT hybrids.…”
Section: Resultsmentioning
confidence: 99%
“…9 Tables 2 and 3 compare our multi-source system to the other official submissions in the top 5 of the WMT19 competition. In automatic evaluation by BLEU, we were tied for third place, although with a slight edge when measured by YiSi-1 (Lo, 2019); in human evaluation, we were in a statistical tie for second place. Notably, our multi-source system was the top non-ensemble pure NMT system, with other higher-scoring systems either being ensembles or SMT/NMT hybrids.…”
Section: Resultsmentioning
confidence: 99%
“…(Stanchev et al, 2019) http://github.com/rwth-i6/ExtendedEditDistance ESIM learned neural representations yes • ⊘ Univ. of Melbourne (Mathur et al, 2019) http://github.com/nitikam/mteval-in-context LEPORa surface linguistic features • ⊘ Dublin City University, ADAPT (Han et al, 2012(Han et al, , 2013 http://github.com/poethan/LEPOR LEPORb surface linguistic features • ⊘ Dublin City University, ADAPT (Han et al, 2012(Han et al, , 2013 http://github.com/poethan/LEPOR (Lo, 2019) http://github.com/chikiulo/YiSi Table 2: Participants of WMT19 Metrics Shared Task. "•" denotes that the metric took part in (some of the language pairs) of the segment-and/or system-level evaluation.…”
Section: Baseline Metricsmentioning
confidence: 99%
“…YiSi (Lo, 2019) is a unified semantic MT quality evaluation and estimation metric for languages with different levels of available resources.…”
Section: Yisi-0 Yisi-1 Yisi-1_srl Yisi-2 Yisi-2_srlmentioning
confidence: 99%
“…Later metrics matched words in the two texts using their word embeddings (Lo, 2017;Clark et al, 2019). More recently, contextual similarity measures were devised for this purpose (Lo, 2019;Wieting et al, 2019;Zhao et al, 2019;Zhang et al, 2020;Sellam et al, 2020). In §7 we provide a qualitative analysis for the latter, presenting typical evaluation mistakes made by a recently-proposed contextual-similarity based metric (Zhang et al, 2020).…”
Section: Generation Evaluationmentioning
confidence: 99%