Developing learner corpus annotation for Chinese grammatical errors

Lee, Lung-Hao; Chang, Liping; Tseng, Yuen-Hsien

doi:10.1109/ialp.2016.7875980

Cited by 14 publications

(10 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The learner corpora used in our shared task were taken from two sources: the writing section of the computer-based Test Of Chinese as a Foreign Language (TOCFL) (Lee et al, 2016) and the writing section of the Hanyu Shuiping Kaoshi(HSK, Test of Chinese Level) (Cui et al, 2011;Zhang et al, 2013). Native Chinese speakers were trained to manually annotate grammatical errors and provide corrections corresponding to each error.…”

Section: Datasetsmentioning

confidence: 99%

Overview of the NLP-TEA 2015 Shared Task for Chinese Grammatical Error Diagnosis

Lee

Chang

2015

Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Datasetsmentioning

confidence: 99%

Overview of the NLP-TEA 2015 Shared Task for Chinese Grammatical Error Diagnosis

Lee

Chang

2015

Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

Self Cite

View full text Add to dashboard Cite

show abstract

“…Concerning language learning, there have been several research efforts that present the error diagnosis process can diagnose, among others, grammatical, syntactic, vocabulary mistakes by using techniques, such as approximate string matching, convolutional sequence to sequence modeling, context representation, etc. [13][14][15][16][17][18][19]. For example, the work of [19] proposes a sequence-to-sequence learning approach using recurrent neural networks for conducting error analysis and diagnosis.…”

Section: Introductionmentioning

confidence: 99%

“…In the work of [14], the authors used the Clause Complex model to analyze the learners' errors emerging from grammatical differences in language learning. The work of [15] proposes a framework of hierarchical tagging sets to perform annotation of grammatical mistakes in language learning. Finally, the authors of [16] performed classification on spelling mistakes in two categories, i.e., orthographic and phonological errors.…”

Section: Introductionmentioning

confidence: 99%

A Cognitive Diagnostic Module Based on the Repair Theory for a Personalized User Experience in E-Learning Software

2021

View full text Add to dashboard Cite

This paper presents a novel cognitive diagnostic module which is incorporated in e-learning software for the tutoring of the markup language HTML. The system is responsible for detecting the learners’ cognitive bugs and delivering personalized guidance. The novelty of this approach is that it is based on the Repair theory that incorporates additional features, such as student negligence and test completion times, in its diagnostic mechanism; also, it employs a recommender module that suggests students optimal learning paths based on their misconceptions using descriptive test feedback and adaptability of learning content. Considering the Repair theory, the diagnostic mechanism uses a library of error correction rules to explain the cause of errors observed by the student during the assessment. This library covers common errors, creating a hypothesis space in that way. Therefore, the test items are expanded, so that they belong to the hypothesis space. Both the system and the cognitive diagnostic tool were evaluated with promising results, showing that they offer a personalized experience to learners.

show abstract

“…For over a decade, user generated content (UGC) has been an important target of NLP technology. It is characterized by phenomena not found in standard texts, such as word lengthening (Brody and Diakopoulos, 2011), dialectal variations (Saito et al, 2017;Blodgett et al, 2016), unknown onomatopoeias (Sasano et al, 2013), grammatical errors (Mizumoto et al, 2011;Lee et al, 2018), and mother tongue interference in non-native writing (Goldin et al, 2018). Typographical errors (typos) also occur often in UGC.…”

Section: Introductionmentioning

confidence: 99%

Building a Japanese Typo Dataset from Wikipedia’s Revision History

Tanaka¹,

Murawaki²,

Kawahara³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

View full text Add to dashboard Cite

User generated texts contain many typos for which correction is necessary for NLP systems to work. Although a large number of typo-correction pairs are needed to develop a data-driven typo correction system, no such dataset is available for Japanese. In this paper, we extract over half a million Japanese typo-correction pairs from Wikipedia's revision history. Unlike other languages, Japanese poses unique challenges: (1) Japanese texts are unsegmented so that we cannot simply apply a spelling checker, and (2) the way people inputting kanji logographs results in typos with drastically different surface forms from correct ones. We address them by combining character-based extraction rules, morphological analyzers to guess readings, and various filtering methods. We evaluate the dataset using crowdsourcing and run a baseline seq2seq model for typo correction.

show abstract

Developing learner corpus annotation for Chinese grammatical errors

Cited by 14 publications

References 22 publications

Overview of the NLP-TEA 2015 Shared Task for Chinese Grammatical Error Diagnosis

Overview of the NLP-TEA 2015 Shared Task for Chinese Grammatical Error Diagnosis

A Cognitive Diagnostic Module Based on the Repair Theory for a Personalized User Experience in E-Learning Software

Building a Japanese Typo Dataset from Wikipedia’s Revision History

Contact Info

Product

Resources

About