Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.581
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction

Abstract: We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection (ESD) and Erroneous Span Correction (ESC). ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. Then, ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans. Experiments show our approach performs comparably t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 32 publications
(24 citation statements)
references
References 41 publications
0
15
0
Order By: Relevance
“…For example, GECToR outperforms baseline models on English GEC datasets, but its performance severely degenerates on the Chinese GEC task, as shown in Table 3. There also exist methods that integrate seq2seq models with sequence tagging methods into a pipeline system to improve the performance or efficiency (Chen et al 2020;Hinson, Huang, and Chen 2020), which cannot be optimized endto-end.…”
Section: Gec By Generating Editsmentioning
confidence: 99%
“…For example, GECToR outperforms baseline models on English GEC datasets, but its performance severely degenerates on the Chinese GEC task, as shown in Table 3. There also exist methods that integrate seq2seq models with sequence tagging methods into a pipeline system to improve the performance or efficiency (Chen et al 2020;Hinson, Huang, and Chen 2020), which cannot be optimized endto-end.…”
Section: Gec By Generating Editsmentioning
confidence: 99%
“…We follow recent work in English GEC to conduct experiments in the restricted training setting of BEA-2019 GEC shared task (Bryant et al, 2019): We use Lang-8 Corpus of Learner English (Mizumoto et al, 2011), NUCLE (Dahlmeier et al, 2013), FCE (Yannakoudakis et al, 2011) and W&I+LOCNESS (Granger;Bryant et al, 2019) as our GEC training data. For facilitating fair comparison in the efficiency evaluation, we follow the previous studies (Omelianchuk et al, 2020;Chen et al, 2020) which conduct GEC efficiency evaluation to use CoNLL-2014(Ng et al, 2014 dataset that contains 1,312 sentences as our main test set, and evaluate the speedup as well as Max-Match (Dahlmeier and Ng, 2012) precision, recall and F 0.5 using their official evaluation scripts 4 . For validation, we use CoNLL-2013 that contains 1,381 sentences as our validation set.…”
Section: Data and Model Configurationmentioning
confidence: 99%
“…Table 5: The performance and online inference efficiency evaluation of efficient GEC models in CoNLL-14. For the models with , their performance and speedup numbers are from Chen et al (2020) who evaluate the online efficiency in the same runtime setting (e.g., GPU and runtime libraries) with ours. The underlines indicate the speedup numbers of the models are evaluated with Tensorflow based on their released codes, which are not strictly comparable here.…”
Section: Evaluation For Aggressive Decodingmentioning
confidence: 99%
See 2 more Smart Citations