Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016
DOI: 10.18653/v1/p16-1070
|View full text |Cite
|
Sign up to set email alerts
|

Universal Dependencies for Learner English

Abstract: We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. The UD annotations are tied to a pre-existing error annotation of the FCE, whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence. Further on, we delineat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
50
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(53 citation statements)
references
References 10 publications
(9 reference statements)
2
50
0
Order By: Relevance
“…Berzak et al [27] show that a parser developed for native English performs slightly worse on learner English than on native English (label attachment score for the former and the latter are 86.3% and 88.3%, respectively). Tetreault et al [7] show that it can reliably extract parse features for grammatical error correction from learner English.…”
Section: Discussionmentioning
confidence: 99%
“…Berzak et al [27] show that a parser developed for native English performs slightly worse on learner English than on native English (label attachment score for the former and the latter are 86.3% and 88.3%, respectively). Tetreault et al [7] show that it can reliably extract parse features for grammatical error correction from learner English.…”
Section: Discussionmentioning
confidence: 99%
“…Note that most POS-tagging guidelines for learner English such as Ragheb and Dickinson (2012), , and Berzak et al (2016) stipulate that a token with an orthographic error should receive the POS label that is given to the corresponding correct spelling. Accordingly, it is preferable that POS taggers for learner English should do the same.…”
Section: Potential Causes Of Pos-tagging Errors In Learner Englishmentioning
confidence: 99%
“…5 POS labels based on distributional information can also be included by using the multiple layer scheme (Díaz-Negrillo et al, 2009;Dickinson and Ragheb, 2009;Berzak et al, 2016). It depends on the user which layer to use.…”
Section: Potential Causes Of Pos-tagging Errors In Learner Englishmentioning
confidence: 99%
“…Table 2 shows some basic statistics on the two learner corpora. We note that the annotations are applied to original sentences as in Berzak et al (2016). Because the KJ corpus contains learner errors and the respective corrections, it allows us to use them for the phrase structure annotation.…”
Section: Datasetmentioning
confidence: 99%
“…We also show that a model trained on it can improve parsing performance (0.878 in Fmeasure) in Section 4. Note that there have been a very limited number of publicly available learner corpora that are syntactically annotated; to the best of our knowledge, Berzak et al (2016) recently released a publicly available learner corpus annotated with dependency (but not phrase structure). This inhibits effective investigations and applications using learner language corpora such as automated grammatical error correction, automated scoring and native language identification.…”
Section: Introductionmentioning
confidence: 99%