2016
DOI: 10.1162/coli_a_00255
|View full text |Cite
|
Sign up to set email alerts
|

All Mixed Up? Finding the Optimal Feature Set for General Readability Prediction and Its Application to English and Dutch

Abstract: Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, although NLP-inspired research has focused on adding more complex readability features, there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close detail the feasibility of constructing a readability prediction system for English and Dutch generic text using supervis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 37 publications
(32 citation statements)
references
References 41 publications
0
32
0
Order By: Relevance
“…Word prevalence is also likely to be of interest to natural language processing researchers writing algorithms to gauge the difficulty of texts. At present, word frequency is used as a proxy of word difficulty (e.g., Benjamin, 2012;De Clercq & Hoste, 2016;Hancke, Vajjala, & Meurers, 2012). Word prevalence is likely to be a better measure, given that it does not completely reduce to differences in word frequency.…”
Section: Uses Of the Word Prevalence Measurementioning
confidence: 99%
“…Word prevalence is also likely to be of interest to natural language processing researchers writing algorithms to gauge the difficulty of texts. At present, word frequency is used as a proxy of word difficulty (e.g., Benjamin, 2012;De Clercq & Hoste, 2016;Hancke, Vajjala, & Meurers, 2012). Word prevalence is likely to be a better measure, given that it does not completely reduce to differences in word frequency.…”
Section: Uses Of the Word Prevalence Measurementioning
confidence: 99%
“…These approaches ignore the sequential or structural information on how sentences construct articles. Efforts have also been made to select optimal features from current hundreds of features [15]. Some computational linguistic methods have been developed to extract higher-level language features.…”
Section: Related Workmentioning
confidence: 99%
“…We observed that little research has been done regarding multilingualism. To the best of our knowledge, the study carried out by De Clercq and Hoste () is the only one that handles more than one language. In this case, readability assessment techniques were analyzed for both Dutch and English, and a readability level prediction tool was developed for each language.…”
Section: Related Workmentioning
confidence: 99%