Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.151
|View full text |Cite
|
Sign up to set email alerts
|

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

Abstract: Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual textual similarity. In this paper, we concern ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations, which represents a natural adversarial setup for multilingual encoders. Referencefree evaluation holds the promise of web-scale comparison of MT syst… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
45
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 44 publications
(57 citation statements)
references
References 42 publications
0
45
0
Order By: Relevance
“…A variation on our HTER estimator model trained with the vector x = [h; s; r; h s; h r; |h − s|; |h − r|] as input to the feed-forward only succeed in boosting segment-level performance in 8 of the 18 language pairs outlined in section 5 below and the average improvement in Kendall's Tau in those settings was +0.0009. As noted in Zhao et al (2020), while cross-lingual pretrained models are adaptive to multiple languages, the feature space between languages is poorly aligned. On this basis we decided in favor of excluding the source embedding on the intuition that the most important information comes from the reference embedding and reducing the feature space would allow the model to focus more on relevant information.…”
Section: Estimator Modelmentioning
confidence: 99%
“…A variation on our HTER estimator model trained with the vector x = [h; s; r; h s; h r; |h − s|; |h − r|] as input to the feed-forward only succeed in boosting segment-level performance in 8 of the 18 language pairs outlined in section 5 below and the average improvement in Kendall's Tau in those settings was +0.0009. As noted in Zhao et al (2020), while cross-lingual pretrained models are adaptive to multiple languages, the feature space between languages is poorly aligned. On this basis we decided in favor of excluding the source embedding on the intuition that the most important information comes from the reference embedding and reducing the feature space would allow the model to focus more on relevant information.…”
Section: Estimator Modelmentioning
confidence: 99%
“…Overall, the average performance for mBERT (0.16) is five times better than random guessing, but consistently lower than the performance for mFastText (0.46 on average). 7 Overall, this shows that mBERT does not properly capture multilingual semantics, a finding that is echoed in some other recent works Zhao et al, 2020b). The apparent reason lies in its naive training process, which does not exploit cross-lingual signals but merely trains on the concatenation of all languages.…”
Section: Cross-lingual Semanticsmentioning
confidence: 83%
“…K et al (2020) show that lexical overlap plays no big role in cross-lingual transfer for mBERT, but the depth of the network does, with deeper models having better transfer. Zhao et al (2020b) find that mBERT lacks fine-grained cross-lingual text understanding and can be fooled by adversarial inputs produced by the corrupt input produced by MT systems.…”
Section: Cross-lingual Representationsmentioning
confidence: 97%
“…Our method is simple, interpretable and produces scores closer to human judgements on an absolute scale, while enabling more finegrained analysis which can be useful to find weak spots in the evaluated model. In future work, we would like to combine knowledge-based signals with unsupervised approaches like YiSi (Lo, 2019) and XMoverScore (Zhao et al, 2020) that use contextualized representations from cross-lingual LMs like multilingual BERT (Devlin et al, 2019). As our method does not require reference translations, we would like to explore scaling it to use much larger or domain specific monolingual datasets.…”
Section: Discussionmentioning
confidence: 99%