Proceedings - Natural Language Processing in a Deep Learning World 2019
DOI: 10.26615/978-954-452-056-4_035
|View full text |Cite
|
Sign up to set email alerts
|

Developing the Old Tibetan Treebank

Abstract: This paper presents a full procedure for the development of a segmented, POS-tagged and chunk-parsed corpus of Old Tibetan. As an extremely lowresource language, Old Tibetan poses non-trivial problems in every step towards the development of a searchable treebank. We demonstrate, however, that a carefully developed, semisupervised method of optimising and extending existing tools for Classical Tibetan, as well as creating specific ones for Old Tibetan, can address these issues. We thus also present the very fi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
1

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(12 citation statements)
references
References 8 publications
0
12
0
Order By: Relevance
“…Philologists generally consider the Annals, that record historical events in the 7-8th centuries, to be older than the more extensive Chronicle, although exact dates of origin are still a matter of ongoing debate (cf. Faggionato and Meelen (2019)). Tibetan texts written between the 11th and mid-20th centuries are generally referred to as 'Classical Tibetan', without further chronological subclassification.…”
Section: Composition Of the Annotated Corpusmentioning
confidence: 99%
See 4 more Smart Citations
“…Philologists generally consider the Annals, that record historical events in the 7-8th centuries, to be older than the more extensive Chronicle, although exact dates of origin are still a matter of ongoing debate (cf. Faggionato and Meelen (2019)). Tibetan texts written between the 11th and mid-20th centuries are generally referred to as 'Classical Tibetan', without further chronological subclassification.…”
Section: Composition Of the Annotated Corpusmentioning
confidence: 99%
“…The linguistic annotation of PACTib consists of tokenisation, sentence segmentation, part-ofspeech tags and syntactic phrase structure labels building for a constituency treebank on recent work by and Faggionato and Meelen (2019). We optimised their methods after an error analysis and for the purposes of this paper, focused mainly on creating meaningful sentence segmentation.…”
Section: Linguistic Annotationmentioning
confidence: 99%
See 3 more Smart Citations