As a fundamental task in natural language processing (NLP), Chinese Grammatical Error Correction (CGEC) [1–3] has gradually received widespread attention and become a research hotspot. However, one obvious deficiency of the existing CGEC evaluation systems is that the evaluation values of the same error correction models are signif- icantly influenced by the Chinese word segmentation (CWS) results or different language models. However, it is expected that these met- rics should be independent of the CWS results and language models for a fair evaluation. To this end, we propose three novel eval- uation metrics for CGEC in two dimensions: reference-based and reference-less. What’s more, according to these three evaluation met- rics, we build a new evaluation metric that can comprehensively evaluate the CGEC model from multiple dimensions. We deeply eval- uate and analyze the reasonableness and validity of the proposed metrics, and we expect them to become a new standard for CGEC.
Grammatical Error Correction (GEC) is a challenge in Natural Language Processing research. Although many researchers have been focusing on GEC in universal languages such as English or Chinese, few studies focus on Indonesian, which is a low-resource language. In this article, we proposed a GEC framework that has the potential to be a baseline method for Indonesian GEC tasks. This framework treats GEC as a multi-classification task. It integrates different language embedding models and deep learning models to correct 10 types of Part of Speech (POS) error in Indonesian text. In addition, we constructed an Indonesian corpus that can be utilized as an evaluation dataset for Indonesian GEC research. Our framework was evaluated on this dataset. Results showed that the Long Short-Term Memory model based on word-embedding achieved the best performance. Its overall macro-average F
0.5
in correcting 10 POS error types reached 0.551. Results also showed that the framework can be trained on a low-resource dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.