2023
DOI: 10.1145/3556538
|View full text |Cite
|
Sign up to set email alerts
|

A Survey on Recent Approaches to Question Difficulty Estimation from Text

Abstract: Question Difficulty Estimation from Text (QDET) is the application of Natural Language Processing techniques to the estimation of a value, either numerical or categorical, which represents the difficulty of questions in educational settings. We give an introduction to the field, build a taxonomy based on question characteristics, and present the various approaches that have been proposed in recent years, outlining opportunities for further research. This survey provides an introduction for researchers and prac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(21 citation statements)
references
References 84 publications
0
21
0
Order By: Relevance
“…Paper Citation TF-IDF (Benedetto et al, 2020a(Benedetto et al, , 2020b) (Lin et al, 2015) Readability measures (Benedetto et al, 2020a;Choi & Moon, 2020) (Susanti et al, 2017;Yaneva et al, 2020) (Yaneva et al, 2019) Corpus analysis software (Choi & Moon, 2020;Pandarova et al, 2019) (El Masri et al, 2017Lee et al, 2019) (Beinborn et al, 2014(Beinborn et al, , 2015) (Loukina et al, 2016;Sano, 2015) Word embedding (Benedetto et al, 2021;Xu et al, 2022) (Bi et al, 2021;Loginova et al, 2021) (Susanti et al, 2020;Xue et al, 2020) (Yaneva et al, 2020;Zhou & Tao, 2020) (Yaneva et al, 2019;Yeung et al, 2019) (Cheng et al, 2019;Hsu et al, 2018) (Huang et al, 2017) Ontology-based metrics (Kurdi et al, 2021;Vinu & Kumar, 2020) (Faizan & Lohmann, 2018;Seyler et al, 2017) (Vinu et al, 2016;Vinu & Kumar, 2017) (Alsubait et al, 2016;Vinu & Kumar, 2015) LSTM/ BiLSTM (Lin et al, 2019;Qiu et al, 2019) (Cheng et al, 2019;Gao et al, 2018) Syntax-level Feature Extraction When investigating sources of difficulty in textual questions, textual co...…”
Section: Feature Extraction Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Paper Citation TF-IDF (Benedetto et al, 2020a(Benedetto et al, , 2020b) (Lin et al, 2015) Readability measures (Benedetto et al, 2020a;Choi & Moon, 2020) (Susanti et al, 2017;Yaneva et al, 2020) (Yaneva et al, 2019) Corpus analysis software (Choi & Moon, 2020;Pandarova et al, 2019) (El Masri et al, 2017Lee et al, 2019) (Beinborn et al, 2014(Beinborn et al, , 2015) (Loukina et al, 2016;Sano, 2015) Word embedding (Benedetto et al, 2021;Xu et al, 2022) (Bi et al, 2021;Loginova et al, 2021) (Susanti et al, 2020;Xue et al, 2020) (Yaneva et al, 2020;Zhou & Tao, 2020) (Yaneva et al, 2019;Yeung et al, 2019) (Cheng et al, 2019;Hsu et al, 2018) (Huang et al, 2017) Ontology-based metrics (Kurdi et al, 2021;Vinu & Kumar, 2020) (Faizan & Lohmann, 2018;Seyler et al, 2017) (Vinu et al, 2016;Vinu & Kumar, 2017) (Alsubait et al, 2016;Vinu & Kumar, 2015) LSTM/ BiLSTM (Lin et al, 2019;Qiu et al, 2019) (Cheng et al, 2019;Gao et al, 2018) Syntax-level Feature Extraction When investigating sources of difficulty in textual questions, textual co...…”
Section: Feature Extraction Methodsmentioning
confidence: 99%
“…Three types of baselines were found to be used for performance comparison: 1) comparison with an existing difficulty prediction model; 2) comparison with another feature extraction technique; or 3) comparison with one or more variants of the same model. Out of the 55 studies surveyed, only 8 papers compared their proposed model to an existing one (Benedetto et al, 2021(Benedetto et al, , 2020a(Benedetto et al, , 2020bQiu et al, 2019;Xu et al, 2022). This was mostly carried out using a different dataset and after making some modifications to the previous model.…”
Section: Evaluation Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In attempts to bypass field-testing, researchers have developed models to predict item difficulty from various item text features, including semantic and syntactic complexity, word and sentence lengths and counts, word embeddings, or readability indices (see AlKhuzaey et al, 2023;Benedetto et al, 2023). Some methods rely on expert judgement (Beinborn, Zesch, & Gurevych, 2014;Choi & Moon, 2020;Loukina et al, 2016;Settles, LaFlair & Hagiwara, 2020), but these subjective approaches can suffer from poor inter-rater-reliability (i.e., consistency between multiple judges) and replicate-ability (AlKhuzaey et al, 2023;Conejo et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Some methods rely on expert judgement (Beinborn, Zesch, & Gurevych, 2014;Choi & Moon, 2020;Loukina et al, 2016;Settles, LaFlair & Hagiwara, 2020), but these subjective approaches can suffer from poor inter-rater-reliability (i.e., consistency between multiple judges) and replicate-ability (AlKhuzaey et al, 2023;Conejo et al, 2020). Others relied on machine-driven natural language processing (NLP) techniques to predict item difficulty and/or discrimination (Benedetto et al, 2020a(Benedetto et al, , 2020bBenedetto et al, 2021;Yaneva et al, 2019;Zhou & Tao, 2020). However, their level of prediction accuracy is limited, and a simple estimation of item difficulty and discrimination does not capture the comprehensive nature of traditional field-testing.…”
Section: Introductionmentioning
confidence: 99%