2020
DOI: 10.1111/jedm.12264
|View full text |Cite
|
Sign up to set email alerts
|

Using Natural Language Processing to Predict Item Response Times and Improve Test Construction

Abstract: In this article, it is shown how item text can be represented by (a) 113 features quantifying the text's linguistic characteristics, (b) 16 measures of the extent to which an information‐retrieval‐based automatic question‐answering system finds an item challenging, and (c) through dense word representations (word embeddings). Using a random forests algorithm, these data then are used to train a prediction model for item response times and predicted response times then are used to assemble test forms. Using emp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 35 publications
0
5
0
Order By: Relevance
“…Presumably, these features impact primarily the time it takes respondents to interpret an item and to select a response, whereas they may have limited relevance for other aspects of the response process (Tourangeau et al 2000), such as the extent to which an item requires retrieval of contents from memory or how easily accessible these contents are (Johnson 2004). Extracting these and other characteristics of items efficiently and reliably with human coders or computer algorithms is an active area of research (Bais et al 2019;Baldwin et al 2021). Arguably, the accessibility of memory contents may play a greater role in predicting the response times for attitudinal survey questions, for which we found the predictive accuracy to be slightly weaker compared to factual and knowledge questions.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Presumably, these features impact primarily the time it takes respondents to interpret an item and to select a response, whereas they may have limited relevance for other aspects of the response process (Tourangeau et al 2000), such as the extent to which an item requires retrieval of contents from memory or how easily accessible these contents are (Johnson 2004). Extracting these and other characteristics of items efficiently and reliably with human coders or computer algorithms is an active area of research (Bais et al 2019;Baldwin et al 2021). Arguably, the accessibility of memory contents may play a greater role in predicting the response times for attitudinal survey questions, for which we found the predictive accuracy to be slightly weaker compared to factual and knowledge questions.…”
Section: Discussionmentioning
confidence: 99%
“…This article is motivated by the idea that having a method for estimating the expected response time for any given survey item in a reference population could provide a useful tool for survey researchers. We hypothesize that the attributes of questions, based on their stems and response options, can robustly predict item response times, which could be applied to new sets of items (Baldwin et al 2021). Prior research has demonstrated that basic features of survey questions (item length, response format) explain nontrivial amounts of response times variance (Yan and Tourangeau 2008).…”
Section: Introductionmentioning
confidence: 99%
“…As with examinee covariates, interest in the relationship between item characteristics and various ancillary data extends beyond the domain of score comparability. Applications include difficulty or item parameter prediction (e.g., Baldwin et al., 2004; Collis et al., 1995; Hall & Ansley, 2008; Irvine et al., 1990; Mislevy, 1988; Nungester & Vass, 1985; Scheuneman et al., 1991; Stowe, 2002; Swaminathan et al., 2003; Wang & Jiao, 2011; Xie, 2019); response time prediction (e.g., Halkitis et al., 1996; Parshall et al., 1994; Smith, 2000; Swanson et al., 2001; Baldwin et al., 2021); evaluation of automatically generated items (Leo et al., 2019; Kurdi, 2020; Benedetto et al., 2020); item pretest survival prediction (Ha et al., 2019, Yaneva et al., 2020); response process complexity estimation (Yaneva et al., 2021); and differential item functioning detection (Sinharay et al., 2009). Nevertheless, while researchers have sought to capitalize on item covariates to improve a broad range of activities, they have not been widely used to facilitate score comparability.…”
Section: Connectivesmentioning
confidence: 99%
“…The response time (rt) information has proven useful for test design (Baldwin et al, 2021;Choe et al, 2018;van der Linden & Xiong, 2013), motivation evaluation (Goldhammer, 2015;Lafit et al, 2019;Wise & Kuhfeld, 2021), enhancement of accuracy measurement (Bolsinova & Molenaar, 2018;van der Linden, 2009), defining "feeling of knowing" as pace (person rank of response time by question) (Thompson et al, 2009(Thompson et al, , 2013, and test accomodations (Margolis & Feinberg, 2020;Rios et al, 2020;Sireci et al, 2005). Precision teaching (Kubina, 2012) and curriculum-based measurement (Cummings & Petscher, 2016) use fluency (rate or speedaccuracy tradeoff) as the primary metric to evaluate student progress and determine competence in the elementary and special education school setting.…”
Section: Introductionmentioning
confidence: 99%