Grammatical Error Correction with Contrastive Learning in Low Error Density Domains

Cao, Hannan; Yang, Wenmian; Ng, Hwee Tou

doi:10.18653/v1/2021.findings-emnlp.419

Cited by 9 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our structure realizes the best F1 scores at the detection level and identification level by a balanced precision and recall among all teams participating in the CGED 2020 task. At the detection level, we improved the F1 value by 0.47% over the state-of-the-art [27][28], and this is because we added syntactic information of the sentences, which is much richer than the POS Score and PMI Score used by the state-of-the-art method. At the identification level, we improved the F1 value by 1.23% over the state-of-the-art [23], and we think this is because the state-of-the-art method only adds ResNet on top of BERT, but we not only add rich information: syntactic information, contextual embeddings and lexical information, but also add CRF layer to improve the performance, so we can get better F1 value.…”

Section: Testing Resultsmentioning

confidence: 99%

Combining GCN and Transformer for Chinese Grammatical Error Detection

Zhang

2022

Journal of Internet Technology

View full text Add to dashboard Cite

<p>This paper describes our system at a task: Chinese Grammatical Error Diagnosis (CGED). The task is held by the Natural Language Processing Techniques for Educational Applications (NLP-TEA) to encourage the development of automatic grammatical error diagnosis in Chinese learning since 2014. The goal of CGED is to diagnose four types of grammatical errors: word selection (S), redundant words (R), missing words (M), and disordered words (W). The automatic CGED system contains two parts including error detection and error correction and our system is designed to solve the error detection problem. Our system is built on three models: 1) a BERT-based model leveraging syntactic information; 2) a BERT-based model leveraging contextual embeddings; 3) a lexicon-based graph neural network leveraging lexical information. We also design an ensemble mechanism to improve the single model’s performance. Finally, our system achieves the highest F1 scores at detection level and identification level among all teams participating in the CGED 2020 task.</p> <p> </p>

show abstract

Section: Testing Resultsmentioning

confidence: 99%

Combining GCN and Transformer for Chinese Grammatical Error Detection

Zhang

2022

Journal of Internet Technology

View full text Add to dashboard Cite

show abstract

“…We evaluate the performance of our Chinese GEC system on the NLPCC-2018 test set with the MaxMatch scorer. Following Cao et al (2021), we use the one-tailed sign test with bootstrap resampling to carry out statistical significance tests.…”

Section: Data and Model Configurationmentioning

confidence: 99%

Unsupervised Grammatical Error Correction Rivaling Supervised Methods

Cao,

Yuan,

Zhang

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

State-of-the-art grammatical error correction (GEC) systems rely on parallel training data (ungrammatical sentences and their manually corrected counterparts), which are expensive to construct. In this paper, we employ the Break-It-Fix-It (BIFI) method to build an unsupervised GEC system. The BIFI framework generates parallel data from unlabeled text using a fixer to transform ungrammatical sentences into grammatical ones, and a critic to predict sentence grammaticality. We present an unsupervised approach to build the fixer and the critic, and an algorithm that allows them to iteratively improve each other. We evaluate our unsupervised GEC system on English and Chinese GEC. Empirical results show that our GEC system outperforms previous unsupervised GEC systems, and achieves performance comparable to supervised GEC systems without ensemble. Furthermore, when combined with labeled training data, our system achieves new state-of-the-art results on the CoNLL-2014 and NLPCC-2018 test sets. 1 * * Work done during Cao's internship at ByteDance.

show abstract

“…They propose additional training stages that make the model consider edit type interdependence when predicting the corrections. Cao, Yang, and Ng (2021) aim to enhance model performance in low-error density domains. The augmented sentences are generated by beam search to capture wrong corrections that the model tends to make.…”

Section: Augmenting Official Datasetsmentioning

confidence: 99%

“…Other systems include Katsumata and Komachi (2020) and Rothe et al (2021), who respectively explored the effectiveness of using pre-trained BART (Lewis et al 2020) and T5 (Raffel et al 2020) as the base model for GEC; Cao, Yang, and Ng (2021) subsequently extended Katsumata and Komachi (2020) using contrastive learning (Section 5.2). Chen et al (2020a) and meanwhile both combined detection with error correction by respectively constraining the output of a GEC system based on a separate GED system and jointly training GED as an auxiliary task (Section 4.3).…”

Section: Tablementioning

confidence: 99%

Grammatical Error Correction: A Survey of the State of the Art

Bryant¹,

Zheng²,

Qorib³

et al. 2022

Preprint

View full text Add to dashboard Cite

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.

show abstract

Grammatical Error Correction with Contrastive Learning in Low Error Density Domains

Cited by 9 publications

References 16 publications

Combining GCN and Transformer for Chinese Grammatical Error Detection

Combining GCN and Transformer for Chinese Grammatical Error Detection

Unsupervised Grammatical Error Correction Rivaling Supervised Methods

Grammatical Error Correction: A Survey of the State of the Art

Contact Info

Product

Resources

About