Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications 2014
DOI: 10.3115/v1/w14-1821
|View full text |Cite
|
Sign up to set email alerts
|

Rule-based and machine learning approaches for second language sentence-level readability

Abstract: We present approaches for the identification of sentences understandable by second language learners of Swedish, which can be used in automatically generated exercises based on corpora. In this work we merged methods and knowledge from machine learning-based readability research, from rule-based studies of Good Dictionary Examples and from second language learning syllabuses. The proposed selection methods have also been implemented as a module in a free web-based language learning platform. Users can use diff… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
21
0
1

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(23 citation statements)
references
References 8 publications
1
21
0
1
Order By: Relevance
“…Historically, work on readability is closely tied to (3), and started with finding the most frequent words (Thorndike, 1921(Thorndike, , 1931Thorndike and Lorge, 1944) with an express pedagogical purpose, both for L1 and L2 learning. While the key assumption behind this work, that learning one word is about as hard as learning another, has stood the test of time, learnability has mushroomed into a large field of research, and even a brief overview is beyond the scope of this paper-see Klare (1974) and Paasche-Orlow et al (2003) for informed but somewhat dated summaries, and for the more contemporary approach of bringing machine learning techniques to the task, see e.g., Pilán et al, 2014;Morato et al, 2021. Here we take the central idea to mean simply that effort is best spent on the words that will cover the overall distribution best, i.e., on the most common ones. Remarkably, this means that serious effort needs to be spent on function words, because these are disproportionately present in the high frequency range.…”
Section: Countingmentioning
confidence: 99%
“…Historically, work on readability is closely tied to (3), and started with finding the most frequent words (Thorndike, 1921(Thorndike, , 1931Thorndike and Lorge, 1944) with an express pedagogical purpose, both for L1 and L2 learning. While the key assumption behind this work, that learning one word is about as hard as learning another, has stood the test of time, learnability has mushroomed into a large field of research, and even a brief overview is beyond the scope of this paper-see Klare (1974) and Paasche-Orlow et al (2003) for informed but somewhat dated summaries, and for the more contemporary approach of bringing machine learning techniques to the task, see e.g., Pilán et al, 2014;Morato et al, 2021. Here we take the central idea to mean simply that effort is best spent on the words that will cover the overall distribution best, i.e., on the most common ones. Remarkably, this means that serious effort needs to be spent on function words, because these are disproportionately present in the high frequency range.…”
Section: Countingmentioning
confidence: 99%
“…Another related field of research in computerassisted language learning is readability assessment and, subsequently, text simplification. There exists ample research on predicting the reading difficulty for various learner groups (Hancke et al, 2012;Collins-Thompson, 2014;Pilán et al, 2014). A specific line of research focuses on reducing the reading difficulty by text simplification (Chandrasekar et al, 1996).…”
Section: Related Workmentioning
confidence: 99%
“…Step 1: Per-sentence reading difficulty estimation. The precise estimation of sentence-level readability is a hard problem and has recently attracted the attention of many researchers (Pilán, Volodina, & Johansson, 2014;Schumacher, Eskenazi, Frishkoff, & Collins-Thompson, 2016;Vajjala & Meurers, 2014). For efficiency, we use heuristic functions to make a rough estimation.…”
Section: The Coupled Bag-of-words Modelmentioning
confidence: 99%