Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects 2014
DOI: 10.3115/v1/w14-5302
|View full text |Cite
|
Sign up to set email alerts
|

Diachronic proximity vs. data sparsity in cross-lingual parser projection. A case study on Germanic

Abstract: For the study of historical language varieties, the sparsity of training data imposes immense problems on syntactic annotation and the development of NLP tools that automatize the process. In this paper, we explore strategies to compensate the lack of training data by including data from related varieties in a series of annotation projection experiments from English to four old Germanic languages: On dependency syntax projected from English to one or multiple language(s), we train a fragment-aware parser train… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 13 publications
0
7
0
Order By: Relevance
“…Other difficulties arise in the annotation projection and machine translation from the fact that the Sumerian language does not have any modern descendants. This is particularly important for the annotation projection, as previous studies have shown that diachronic relatedness is an important factor that affects the quality of annotation projection (Sukhareva and Chiarcos, 2014). Thus, we plan to conduct our pilot experiments on modern languages such as Turkish and Basque that are grammatically similar to Sumerian (agglutinative, split-ergative) to guarantee the scalability of our implementation, but more importantly to be able to conduct experiments in parallel with the morphological and syntactic annotation of the Sumerian texts.…”
Section: Challenges and Risksmentioning
confidence: 99%
“…Other difficulties arise in the annotation projection and machine translation from the fact that the Sumerian language does not have any modern descendants. This is particularly important for the annotation projection, as previous studies have shown that diachronic relatedness is an important factor that affects the quality of annotation projection (Sukhareva and Chiarcos, 2014). Thus, we plan to conduct our pilot experiments on modern languages such as Turkish and Basque that are grammatically similar to Sumerian (agglutinative, split-ergative) to guarantee the scalability of our implementation, but more importantly to be able to conduct experiments in parallel with the morphological and syntactic annotation of the Sumerian texts.…”
Section: Challenges and Risksmentioning
confidence: 99%
“…It has also been used in cross-lingual POS tagging Fossum and Abney, 2005), NP-chunking ) and cross-lingual dependency parsing (Sukhareva and Chiarcos, 2014) before. and Fossum and Abney (2005) use word-aligned parallel translations of the Bible to project the predictions of POS taggers for several language pairs, including English, German, and Spanish to Czech and French.…”
Section: Related Workmentioning
confidence: 99%
“…Performance of distantly supervised methods such as annotation projection or cross-lingual tool adaptation depends on the diachronic relatedness between the source and the target languages. For example, annotation projection from modern English into middle English gives better results than into old English because middle English grammatically and lexically resembles modern English much more than Old English (Sukhareva and Chiarcos, 2014). Annotation projection is thus typically applied to related languages (Tiedemann and Agic, 2016).…”
Section: Introductionmentioning
confidence: 99%