Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1102
|View full text |Cite
|
Sign up to set email alerts
|

A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages

Abstract: Parsers are available for only a handful of the world's languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration. Experimenting on three typologically diverse low-resource languages-North Sámi, Galician, and Kazah-We find that (1) when only the low-resource treebank is a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
34
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 44 publications
(37 citation statements)
references
References 27 publications
1
34
0
Order By: Relevance
“…As expected these scores are considerably lower than when using in-language OOD data, being so poor that these parsers are hardly useful, confirming previous research, e.g. Meechan-Maddon and Nivre (2019) and Vania et al (2019). In this case there is no clear difference between IND and OOD data.…”
Section: Resultssupporting
confidence: 86%
“…As expected these scores are considerably lower than when using in-language OOD data, being so poor that these parsers are hardly useful, confirming previous research, e.g. Meechan-Maddon and Nivre (2019) and Vania et al (2019). In this case there is no clear difference between IND and OOD data.…”
Section: Resultssupporting
confidence: 86%
“…In the computer vision community, this is a popular approach where, e.g., rotating an image is invariant to the classification of an image's content. For text, on the token level, this can be done by replacing words with equivalents, such as synonyms (Wei and Zou, 2019), entities of the same type (Raiman and Miller, 2017;Dai and Adel, 2020) or words that share the same morphology (Gulordava et al, 2018;Vania et al, 2019). Such replacements can also be guided by a language model that takes context into consideration (Fadaee et al, 2017;Kobayashi, 2018).…”
Section: Data Augmentationmentioning
confidence: 99%
“…Toxic language classification has been conducted in a number of studies (Schmidt and Wiegand, 2017;Davidson et al, 2017;Wulczyn et al, 2017;Gröndahl et al, 2018;Qian et al, 2019;Breitfeller et al, 2019). NLP applications of data augmentation include text classification (Ratner et al, 2017;Wei and Zou, 2019;Mesbah et al, 2019), user behavior categorization (Wang and Yang, 2015), dependency parsing (Vania et al, 2019), and machine translation (Fadaee et al, 2019;Xia et al, 2019). Related techniques are also used in automatic paraphrasing (Madnani and Dorr, 2010;Li et al, 2018) and writing style transfer (Shen et al, 2017;Shetty et al, 2018;Mahmood et al, 2019).…”
Section: Related Workmentioning
confidence: 99%