2020
DOI: 10.1007/978-3-030-52237-7_4
|View full text |Cite
|
Sign up to set email alerts
|

Introducing a Framework to Assess Newly Created Questions with Natural Language Processing

Abstract: Statistical models such as those derived from Item Response Theory (IRT) enable the assessment of students on a specific subject, which can be useful for several purposes (e.g., learning path customization, drop-out prediction). However, the questions have to be assessed as well and, although it is possible to estimate with IRT the characteristics of questions that have already been answered by several students, this technique cannot be used on newly generated questions. In this paper, we propose a framework t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 24 publications
0
14
0
Order By: Relevance
“…4 Most common prediction models (as a percentage of publications studied) interesting is that all the studies that selected RF as a prediction model conducted an algorithm selection experiment to choose the algorithm that performs better on the question difficulty prediction task. It was consistently found across these studies that the RF regressor outperformed other ML algorithms such as Linear regression, SVM, Gaussian processes, fully connected neural networks and Decision Trees (Benedetto et al, 2020a;Xu et al, 2022;Yaneva et al, 2020).…”
Section: Inputmentioning
confidence: 92%
See 3 more Smart Citations
“…4 Most common prediction models (as a percentage of publications studied) interesting is that all the studies that selected RF as a prediction model conducted an algorithm selection experiment to choose the algorithm that performs better on the question difficulty prediction task. It was consistently found across these studies that the RF regressor outperformed other ML algorithms such as Linear regression, SVM, Gaussian processes, fully connected neural networks and Decision Trees (Benedetto et al, 2020a;Xu et al, 2022;Yaneva et al, 2020).…”
Section: Inputmentioning
confidence: 92%
“…Paper Citation TF-IDF (Benedetto et al, 2020a(Benedetto et al, , 2020b) (Lin et al, 2015) Readability measures (Benedetto et al, 2020a;Choi & Moon, 2020) (Susanti et al, 2017;Yaneva et al, 2020) (Yaneva et al, 2019) Corpus analysis software (Choi & Moon, 2020;Pandarova et al, 2019) (El Masri et al, 2017Lee et al, 2019) (Beinborn et al, 2014(Beinborn et al, , 2015) (Loukina et al, 2016;Sano, 2015) Word embedding (Benedetto et al, 2021;Xu et al, 2022) (Bi et al, 2021;Loginova et al, 2021) (Susanti et al, 2020;Xue et al, 2020) (Yaneva et al, 2020;Zhou & Tao, 2020) (Yaneva et al, 2019;Yeung et al, 2019) (Cheng et al, 2019;Hsu et al, 2018) (Huang et al, 2017) Ontology-based metrics (Kurdi et al, 2021;Vinu & Kumar, 2020) (Faizan & Lohmann, 2018;Seyler et al, 2017) (Vinu et al, 2016;Vinu & Kumar, 2017) (Alsubait et al, 2016;Vinu & Kumar, 2015) LSTM/ BiLSTM (Lin et al, 2019;Qiu et al, 2019) (Cheng et al, 2019;Gao et al, 2018) Syntax-level Feature Extraction When investigating sources of difficulty in textual questions, textual co...…”
Section: Feature Extraction Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Some methods rely on expert judgement (Beinborn, Zesch, & Gurevych, 2014;Choi & Moon, 2020;Loukina et al, 2016;Settles, LaFlair & Hagiwara, 2020), but these subjective approaches can suffer from poor inter-rater-reliability (i.e., consistency between multiple judges) and replicate-ability (AlKhuzaey et al, 2023;Conejo et al, 2020). Others relied on machine-driven natural language processing (NLP) techniques to predict item difficulty and/or discrimination (Benedetto et al, 2020a(Benedetto et al, , 2020bBenedetto et al, 2021;Yaneva et al, 2019;Zhou & Tao, 2020). However, their level of prediction accuracy is limited, and a simple estimation of item difficulty and discrimination does not capture the comprehensive nature of traditional field-testing.…”
Section: Introductionmentioning
confidence: 99%