Chinese Grammatical Error Diagnosis using Statistical and Prior Knowledge driven Features with Probabilistic Ensemble Enhancement

Fu, Rao; Pei, Zhengqi; Gong, Jiefu; Song, Wei; Teng, Dechuan; Che, Wanxiang; Wang, Shijin; Hu, Guoping; Liu, Ting

doi:10.18653/v1/w18-3707

Cited by 13 publications

(6 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most studies regard it as a sequence tagging task, where each token will be given a correct label or an error-type. Sequence labeling methods are widely used for CGED, such as feature-based statistical models (Chang et al, 2012), and neural models (Fu et al, 2018). Due to the effectiveness of BERT (Devlin et al, 2019) in many other NLP applications, recent studies adopt BERT as the basic architecture of CGED models (Fang et al, 2020;Wang et al, 2020b;Li and Shi, 2021).…”

Section: Related Workmentioning

confidence: 99%

Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation

Yue¹,

Liu²,

Cai³

et al. 2022

Findings of the Association for Computational Linguistics: ACL 2022

View full text Add to dashboard Cite

Chinese Grammatical Error Detection(CGED) aims at detecting grammatical errors in Chinese texts. One of the main challenges for CGED is the lack of annotated data. To alleviate this problem, previous studies proposed various methods to automatically generate more training samples, which can be roughly categorized into rule-based methods and model-based methods. The rule-based methods construct erroneous sentences by directly introducing noises into original sentences. However, the introduced noises are usually context-independent, which are quite different from those made by humans. The model-based methods utilize generative models to imitate human errors. The generative model may bring too many changes to the original sentences and generate semantically ambiguous sentences, so it is difficult to detect grammatical errors in these generated sentences. In addition, generated sentences may be error-free and thus become noisy data. To handle these problems, we propose CNEG, a novel Conditional Non-Autoregressive Error Generation model for generating Chinese grammatical errors. Specifically, in order to generate a context-dependent error, we first mask a span in a correct text, then predict an erroneous span conditioned on both the masked text and the correct span. Furthermore, we filter out error-free spans by measuring their perplexities in the original sentences. Experimental results show that our proposed method achieves better performance than all compared data augmentation methods on the CGED-2018 and CGED-2020 benchmarks.

show abstract

Section: Related Workmentioning

confidence: 99%

Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation

Yue¹,

Liu²,

Cai³

et al. 2022

Findings of the Association for Computational Linguistics: ACL 2022

View full text Add to dashboard Cite

show abstract

“…Many researchers have made outstanding achievements on CSC (Zhang et al, 2020; and CGED (Fu et al, 2018). Existing CSC and CGED models cannot achieve good results for CSER because semantic errors are often difficult compared to other errors.…”

Section: Text Error Detectionmentioning

confidence: 99%

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Sun¹,

Wang²,

Che³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors. These errors have been studied extensively and are relatively simple for humans. On the contrary, Chinese semantic errors are understudied and more complex that humans cannot easily recognize. The task of this paper is Chinese Semantic Error Recognition (CSER), a binary classification task to determine whether a sentence contains semantic errors. The current research has no effective method to solve this task. In this paper, we inherit the model structure of BERT and design several syntax-related pre-training tasks so that the model can learn syntactic knowledge. Our pre-training tasks consider both the directionality of the dependency structure and the diversity of the dependency relationship. Due to the lack of a published dataset for CSER, we build a high-quality dataset for CSER for the first time named Corpus of Chinese Linguistic Semantic Acceptability (Co-CLSA). The experimental results on the Co-CLSA show that our methods outperform universal pre-trained models and syntax-infused models.

show abstract

“…Previous research [2,5] put a lot of effort into feature engineering, such as pretrained and parsing features. The most significant parsing characteristics are part-of-speech tagging (POS) and dependency information, indicating that the job is strongly related to the structure of the sentence syntactic dependence.…”

Section: Gcnmentioning

confidence: 99%

Combining GCN and Transformer for Chinese Grammatical Error Detection

Zhang

2022

Journal of Internet Technology

View full text Add to dashboard Cite

<p>This paper describes our system at a task: Chinese Grammatical Error Diagnosis (CGED). The task is held by the Natural Language Processing Techniques for Educational Applications (NLP-TEA) to encourage the development of automatic grammatical error diagnosis in Chinese learning since 2014. The goal of CGED is to diagnose four types of grammatical errors: word selection (S), redundant words (R), missing words (M), and disordered words (W). The automatic CGED system contains two parts including error detection and error correction and our system is designed to solve the error detection problem. Our system is built on three models: 1) a BERT-based model leveraging syntactic information; 2) a BERT-based model leveraging contextual embeddings; 3) a lexicon-based graph neural network leveraging lexical information. We also design an ensemble mechanism to improve the single model’s performance. Finally, our system achieves the highest F1 scores at detection level and identification level among all teams participating in the CGED 2020 task.</p> <p> </p>

show abstract

Chinese Grammatical Error Diagnosis using Statistical and Prior Knowledge driven Features with Probabilistic Ensemble Enhancement

Cited by 13 publications

References 5 publications

Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation

Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Combining GCN and Transformer for Chinese Grammatical Error Detection

Contact Info

Product

Resources

About