2016
DOI: 10.48550/arxiv.1605.04278
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Universal Dependencies for Learner English

Abstract: We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. The UD annotations are tied to a pre-existing error annotation of the FCE, whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence. Further on, we delineat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 2 publications
0
0
0
Order By: Relevance
“…Although previous studies have consistently shown that fine-tuning L1 pre-trained models with L2 data improves the accuracy of both tokenization and POS tagging for L2 data (Berzak et al, 2016;Kyle et al, 2022;Sung and Shin, 2023), there are two key questions unresolved with respect to developing L2 domain-specific models. First, it is unclear how the models perform in zero-shot scenarios with unseen L2 data (i.e., L2 test sets not sourced from the same origin as L2 training data 1 ), which is a crucial factor for enhancing the model's reliability and robustness (Choi and Palmer, 2012).…”
Section: L2 Domain-specific Model Developmentmentioning
confidence: 99%
“…Although previous studies have consistently shown that fine-tuning L1 pre-trained models with L2 data improves the accuracy of both tokenization and POS tagging for L2 data (Berzak et al, 2016;Kyle et al, 2022;Sung and Shin, 2023), there are two key questions unresolved with respect to developing L2 domain-specific models. First, it is unclear how the models perform in zero-shot scenarios with unseen L2 data (i.e., L2 test sets not sourced from the same origin as L2 training data 1 ), which is a crucial factor for enhancing the model's reliability and robustness (Choi and Palmer, 2012).…”
Section: L2 Domain-specific Model Developmentmentioning
confidence: 99%