2020
DOI: 10.3390/app10207285
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Classification of Text Complexity

Abstract: This work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by linguistic experts for second language teaching and assessment purposes. The commonly adopted Common European Framework of Reference for Languages (CEFR) levels were used as target classification classes, texts were e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 48 publications
1
4
0
1
Order By: Relevance
“…The results have shown the effectiveness of these features and the SVM classifier. Similar results can be found in the research articles by Szügyi et al ( 2019 ) for texts in the German language and Santucci et al ( 2020 ) where the authors achieved the best results for the Italian language using a set of linguistic features in conjunction with the Random Forest classifier. Lyashevskaya et al ( 2021 ) showed the effectiveness of linguistic features for the task of complexity assessment of the texts written by Russian learners of English.…”
Section: Related Worksupporting
confidence: 87%
“…The results have shown the effectiveness of these features and the SVM classifier. Similar results can be found in the research articles by Szügyi et al ( 2019 ) for texts in the German language and Santucci et al ( 2020 ) where the authors achieved the best results for the Italian language using a set of linguistic features in conjunction with the Random Forest classifier. Lyashevskaya et al ( 2021 ) showed the effectiveness of linguistic features for the task of complexity assessment of the texts written by Russian learners of English.…”
Section: Related Worksupporting
confidence: 87%
“…Somewhat more rarely than SVM, decision trees and random forests are used to classify texts; the essence of the latter method is the use of a large number of decision trees, which together have good predictive power. In Kauchak et al (2014) and Santucci et al (2020), random forests perform better than other models.…”
Section: Machine Learning and Natural Language Processing Methods For Assessing The Readability Of Textsmentioning
confidence: 95%
“…The task of CEFR classification itself however, seems to have received fewer attention. Among the studies that address this problem for various languages are Santucci et al (2020) (Italian), Hancke and Meurers (2013) (German), Vajjala and Lõo (2014) (Estonian) and Volodina et al (2016) (Swedish). Earlier work on English (our language of interest) is represented by Tack et al (2017), who create their own annotated corpus and experiment with automated classification using several classification algorithms.…”
Section: Related Workmentioning
confidence: 99%