Élie Roux scite author profile

Élie Roux

4Publications

8Citation Statements Received

35Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods

Meelen

Roux²,

Hill

2021

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

This article presents a pipeline that converts collections of Tibetan documents in plain text or XML into a fully segmented and POS-tagged corpus. We apply the pipeline to the large extent collection of the Buddhist Digital Resource Center. The semi-supervised methods presented here not only result in a new and improved version of the largest annotated Tibetan corpus to date, the integration of rule-based, memory-based, and neural-network methods also serves as a good example of how to overcome challenges of under-researched languages. The end-to-end accuracy of our entire automatic pipeline of 91.99% is high enough to make the resulting corpus a useful resource for both linguists and scholars of Tibetan studies.

show abstract

Meta-dating the PArsed Corpus of Tibetan (PACTib)

Meelen¹,

Roux²

2020

View full text Add to dashboard Cite

This paper presents PACTib, the PArsed Corpus of Tibetan. This new resource is unique in bringing together a large number of Tibetan texts (>5000) from the 11th century until the present day. The texts in this diachronic corpus are provided with metadata containing information on dates and patron-/authorship and linguistic annotation in the form of tokenisation, sentence segmentation, part-of-speech tags and syntactic phrase structure. With over 166 million tokens across 11 centuries and a variety of genres, PACTib will open up a wide range of research opportunities for historical and comparative linguistics and scholars in Tibetan Studies, which we illustrate with two short case studies.

show abstract

Algorithmic description of the decomposition and checking of a Classical Tibetan syllable

Roux¹,

Hildt²

2018

View full text Add to dashboard Cite

This document presents our research on the the correct formation of a Classical Tibetan syllable. It was triggered by attempts at defining the boundaries of well-formed syllables in Classical Tibetan for spell checking purposes. Formalizing the formation of the syllable led us to inspect the small differences among grammar books, both in Western and Tibetan language. We then checked these differences against the Tibetan dictionaries we consider reliable, and also against the Kangyur. Our inquiry finally led us to study the way to decompose a syllable, discussing the ambiguous cases, as well as the formation of the Dzongkha syllable.

show abstract

The Annotated Corpus of Classical Tibetan (ACTib) - Version 2.0 (Segmented & POS-tagged)

Meelen¹,

Roux²

2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Élie Roux

Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods

Meta-dating the PArsed Corpus of Tibetan (PACTib)

Algorithmic description of the decomposition and checking of a Classical Tibetan syllable

The Annotated Corpus of Classical Tibetan (ACTib) - Version 2.0 (Segmented & POS-tagged)

Contact Info

Product

Resources

About