Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) 2015
DOI: 10.18653/v1/w15-3710
|View full text |Cite
|
Sign up to set email alerts
|

Automatic interlinear glossing as two-level sequence classification

Abstract: Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic glossing of Chintang. We decompose the task of glossing into steps suitable for statistical processing. We first perform gr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 9 publications
0
6
0
Order By: Relevance
“…As for the morphological annotation, for Chintang, most of the corpus was hand-segmented and manually annotated for morphology and parts-of-speech by trained linguistic students under supervision by an expert in the language. A small part of the morphological annotation was generated automatically based on a morphological tagger (Ruzsics & Samardzic, 2017;Samardzic, Schikowski & Stoll, 2015). Japanese morphological tagging was done with the morphological tagger in CHILDES (JMOR, Miyata and Naka, 2014).…”
Section: Datamentioning
confidence: 99%
“…As for the morphological annotation, for Chintang, most of the corpus was hand-segmented and manually annotated for morphology and parts-of-speech by trained linguistic students under supervision by an expert in the language. A small part of the morphological annotation was generated automatically based on a morphological tagger (Ruzsics & Samardzic, 2017;Samardzic, Schikowski & Stoll, 2015). Japanese morphological tagging was done with the morphological tagger in CHILDES (JMOR, Miyata and Naka, 2014).…”
Section: Datamentioning
confidence: 99%
“…The low-resource language data came from interlinearized data that was polished for publication. McMillan-Major (2020) and some other experiments such as Samardzic et al (2015) use information from lines of interlinearized texts such as translation and POS tags.…”
Section: Related Workmentioning
confidence: 99%
“…• Accuracy: percentage of correct (full) analyses for each token. It is the main metric used in previous work (Samardžić et al, 2015;McMillan-Major, 2020).…”
Section: Evaluating Gloss Generationmentioning
confidence: 99%
“…Several works have studied the automated IGT generation task Samardžić et al, 2015;Moeller and Hulden, 2018;McMillan-Major, 2020). They mainly used machine learning methods such as CRF and SVM to generate gloss and proposed a series of heuristic post-editing algorithms to improve the performance.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation