Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.680
|View full text |Cite
|
Sign up to set email alerts
|

Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses

Abstract: Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a cha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 12 publications
(21 citation statements)
references
References 18 publications
0
18
0
Order By: Relevance
“…It uses articles or blogs (e.g., Wiki, Yahoo)) written by native English speakers to explore grammatical error phenomena in different domains. CWEB (Flachs et al, 2020) also uses website texts in English, such as blogs. The difference between CWEB and CMEG is that the percentage of erroneous tokens in the former is smaller than the latter as the purpose of CWEB is to study grammatical error correction in low error density domains.…”
Section: Grammatical Error Correction Datasetsmentioning
confidence: 99%
See 4 more Smart Citations
“…It uses articles or blogs (e.g., Wiki, Yahoo)) written by native English speakers to explore grammatical error phenomena in different domains. CWEB (Flachs et al, 2020) also uses website texts in English, such as blogs. The difference between CWEB and CMEG is that the percentage of erroneous tokens in the former is smaller than the latter as the purpose of CWEB is to study grammatical error correction in low error density domains.…”
Section: Grammatical Error Correction Datasetsmentioning
confidence: 99%
“…Such errors include not only lexical collocation errors but also longdistance syntactic constituency combination errors (e.g., inappropriate subject-object combination). This error type is similar to "replacing" error in some GEC datasets (e.g., CWEB (Flachs et al, 2020)) as one element of an inappropriate combination should be usually replaced with other expressions. As we want to find text spans associated with erroneous words/phrases, we term this error type as "inappropriate combination".…”
Section: Error Taxonomymentioning
confidence: 99%
See 3 more Smart Citations