Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications 2015
DOI: 10.3115/v1/w15-0614
|View full text |Cite
|
Sign up to set email alerts
|

The Jinan Chinese Learner Corpus

Abstract: We present the Jinan Chinese Learner Corpus, a large collection of L2 Chinese texts produced by learners that can be used for educational tasks. The present work introduces the data and provides a detailed description. Currently, the corpus contains approximately 6 million Chinese characters written by students from over 50 different L1 backgrounds. This is a large-scale corpus of learner Chinese texts which is freely available to researchers either through a web interface or as a set of raw texts. The data ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
16
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 22 publications
(17 citation statements)
references
References 20 publications
(11 reference statements)
0
16
0
Order By: Relevance
“…Growing interest has led to the recent development of the Jinan Chinese Learner Corpus (JCLC) (Wang, Malmasi and Huang 2015), the first large-scale corpus of L2 Chinese consisting of university student essays. Learners from fifty-nine countries are represented and proficiency levels are sampled representatively across beginner, intermediate and advanced levels.…”
Section: Chinesementioning
confidence: 99%
“…Growing interest has led to the recent development of the Jinan Chinese Learner Corpus (JCLC) (Wang, Malmasi and Huang 2015), the first large-scale corpus of L2 Chinese consisting of university student essays. Learners from fifty-nine countries are represented and proficiency levels are sampled representatively across beginner, intermediate and advanced levels.…”
Section: Chinesementioning
confidence: 99%
“…The best result under crossvalidation on the TOEFL dataset, which includes 11 native languages (with a rather diverse distribution of language families), was 85.2% accuracy. Applying these methods to different datasets (the ASK corpus of learners of Norwegian (Tenfjord et al, 2006) and the Jinan Chinese Learner Corpus (Wang et al, 2015), 10-11 native languages in each) resulted in 76.5% accuracy for the Chinese data and 81.8% for the Norwegian data, with LDA-based classification yielding top results.…”
Section: Related Workmentioning
confidence: 99%
“…Recent NLI studies on languages other than English include Arabic (Malmasi and Dras, 2014a) and Chinese (Malmasi and Dras, 2014b;Wang et al, 2015). To the best of our knowledge, no study has been published on Portuguese and the NLI-PT dataset opens new possibilities of research for Portuguese.…”
Section: Related Workmentioning
confidence: 99%
“…Even though most NLI research has been carried out on English data, an important research trend in recent years has been the application of NLI methods to other languages, as discussed in Malmasi and Dras (2015). Recent NLI studies on languages other than English include Arabic (Malmasi and Dras, 2014a) and Chinese (Malmasi and Dras, 2014b;Wang et al, 2015). To the best of our knowledge, no study has been published on Portuguese and the NLI-PT dataset opens new possibilities of research for Portuguese.…”
Section: Related Workmentioning
confidence: 99%