Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.471
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations

Abstract: Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers. Manual production of IGT takes time and requires linguistic expertise. We attempt to address this issue by creating automatic glossing models, using modern multi-source neural models that additionally leverage easy-to-collect translations. We further explore cross-lingual transfer and a simple output length control mechanism, further refining our models. Evaluated … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(10 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…There also have been studies on extracting grammatical information from text by using dependency parsers (Chaudhary et al, 2020;Pratapa et al, 2021) and automatically glossing text (Zhao et al, 2020;Samardžić et al, 2015) as well as compiling full morphological paradigms from it (Moeller et al, 2020). By contrast, our method is independent of such annotation schemata, and it is also simpler as it does not aim at generating full grammatical or morphological descriptions of the languages examined.…”
Section: Related Workmentioning
confidence: 99%
“…There also have been studies on extracting grammatical information from text by using dependency parsers (Chaudhary et al, 2020;Pratapa et al, 2021) and automatically glossing text (Zhao et al, 2020;Samardžić et al, 2015) as well as compiling full morphological paradigms from it (Moeller et al, 2020). By contrast, our method is independent of such annotation schemata, and it is also simpler as it does not aim at generating full grammatical or morphological descriptions of the languages examined.…”
Section: Related Workmentioning
confidence: 99%
“…The first approaches simply memorized earlier glossing decisions and enabled the annotator to re-use these later (Baines, 2009). Later approaches have relied on structured models like CRFs (McMillan-Major, 2020), RNN encoderdecoders (Moeller and Hulden, 2018) and transformers (Zhao et al, 2020) to generate glosses for unseen tokens. NLP techniques can also be used to generate inflection tables from IGT (Moeller et al, 2020).…”
Section: Nlp For Underdocumented Languagesmentioning
confidence: 99%
“…dards is important: Zhao et al (2020) note that this can have an impact on the performance of glossing systems. McMillan-Major (2020) notes a further challenge: IGT often includes not only morphological information, but also syntactic, semantic, and pragmatic annotations, which can be much harder to learn in low-resource settings.…”
Section: Nlp For Underdocumented Languagesmentioning
confidence: 99%
See 1 more Smart Citation
“…Biblical texts are usually well-studied and thus both references to the Strong's numbers as well as morphological information are available for Hebrew and Greek texts. Automated glossing is also a widely studied field, see [18] or [19]. These approaches have never been used to create interlinear glossed Biblical texts.…”
Section: Related Workmentioning
confidence: 99%