Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.658
|View full text |Cite
|
Sign up to set email alerts
|

A Call for More Rigor in Unsupervised Cross-lingual Learning

Abstract: We review motivations, definition, approaches, and methodology for unsupervised crosslingual learning and call for a more rigorous position in each of them. An existing rationale for such research is based on the lack of parallel data for many of the world's languages. However, we argue that a scenario without any parallel data and abundant monolingual data is unrealistic in practice. We also discuss different training signals that have been used in previous work, which depart from the pure unsupervised settin… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(28 citation statements)
references
References 61 publications
0
28
0
Order By: Relevance
“…In cross-lingual learning the feature space drastically changes, as alphabets, vocabularies and word order can be different. It can be seen as extreme adaptation scenario, for which parallel data may exist and can be used to build multilingual representations Artetxe et al, 2020). Second, instead of adapting to a particular target, there is some work on domain generalization aimed at building a single system which is robust on several known target domains.…”
Section: Domain Adaptation and Transfer Learning: Notationmentioning
confidence: 99%
“…In cross-lingual learning the feature space drastically changes, as alphabets, vocabularies and word order can be different. It can be seen as extreme adaptation scenario, for which parallel data may exist and can be used to build multilingual representations Artetxe et al, 2020). Second, instead of adapting to a particular target, there is some work on domain generalization aimed at building a single system which is robust on several known target domains.…”
Section: Domain Adaptation and Transfer Learning: Notationmentioning
confidence: 99%
“…While not being the first, BERT [1] has arguably become the most successful in generating contextualized representations, leading to a new research field termed BERTology with hundreds of publications [20]. However, the success is largely centered around English and few other high-resource languages [21], limiting the use of this technology in most of the world's languages [14].…”
Section: Related Workmentioning
confidence: 99%
“…This is based on the assumption that there are none or few labeled examples in L T while unlabeled data is abundant. However, it has been suggested that this assumption is neither realistic nor practical [14,15].…”
Section: Transfer Learning As Post-trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…Low-resource languages sometimes leave no choice but to use unsupervised models. In [3], the authors argue that strict unsupervised training without any parallel data is rather impractical. Nevertheless, they acknowledge theoretical scientific value of further research in this direction.…”
Section: Related Workmentioning
confidence: 99%