2022
DOI: 10.1007/s10459-022-10092-z
|View full text |Cite
|
Sign up to set email alerts
|

Feasibility assurance: a review of automatic item generation in medical assessment

Abstract: Background Current demand for multiple-choice questions (MCQs) in medical assessment is greater than the supply. Consequently, an urgency for new item development methods arises. Automatic Item Generation (AIG) promises to overcome this burden, generating calibrated items based on the work of computer algorithms. Despite the promising scenario, there is still no evidence to encourage a general application of AIG in medical assessment. It is therefore important to evaluate AIG regarding its feasibi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 13 publications
(4 citation statements)
references
References 33 publications
(95 reference statements)
0
4
0
Order By: Relevance
“…A fundamental issue for AIG research regards the validation of its processes (Gierl et al, 2022b;Shin, 2021). To date, few research has been devoted to collecting evidence to support the validity of AIG (Falcão et al, 2022;Gierl et al, 2022b;Rafatbakhsh et al, 2020). Since incorporating cognitive models into test design and development is required to support validity arguments for test-based inferences, we have reason to suppose that AIG incorporates validity evidence in its methods (Gierl et al, 2022b;Leighton & Gierl, 2011).…”
Section: Aig Versus Manual Item Writingmentioning
confidence: 99%
“…A fundamental issue for AIG research regards the validation of its processes (Gierl et al, 2022b;Shin, 2021). To date, few research has been devoted to collecting evidence to support the validity of AIG (Falcão et al, 2022;Gierl et al, 2022b;Rafatbakhsh et al, 2020). Since incorporating cognitive models into test design and development is required to support validity arguments for test-based inferences, we have reason to suppose that AIG incorporates validity evidence in its methods (Gierl et al, 2022b;Leighton & Gierl, 2011).…”
Section: Aig Versus Manual Item Writingmentioning
confidence: 99%
“…In his seminal work, Falcão clearly delimitates scoring as the procedures uses to develop AIG, generalization as the di culty measured in the test and extrapolation as the discrimination of the items. Therefore, we examine previous efforts to gather validity inferences on chatbots developed MCQs on the lens of Kane Framework proposed by (Cook et al, 2015), and Falcão (Falcão et al, 2022). A crosssectional study used ChatGPT, Google Bard and Microsoft Bing to develop MCQs for a physiology course, in this study a careful blueprint was mapped by two content experts; inferences of scoring and generalization were collected.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Kane proposes four inferences: 1) Scoring, which is marked by the construction of an item in terms of its administration, ranging from the format of the test (i.e., multiple-choice-questions, skills evaluation) to the procedures planned to administer the test (i.e., training of raters, facilities needed); 2) Generalization, refers to the degree in which what is assessed (i.e., ten multiple-choice-questions based on the cardiology module) represent what should be assessed (I.e., the material of the cardiology module), this process may be aided by using a test blueprint or using reliability indices; 3) Extrapolation, is the relation between the test performance and real-world performance, this inference requires that the test theoretically re ects real-world performance (i.e., evaluate the test with content experts) or empirically (i.e., identifying the correlation between the test and workplace assessments); and 4) Implications, which measures real-world impact of the assessment using a cost-effectiveness approach. To further understand the application for Kane validity framework, a recent review conducted on the use of Automatic Item Generation (AIG) may be adequate (Falcão et al, 2022). In his seminal work, Falcão clearly delimitates scoring as the procedures uses to develop AIG, generalization as the di culty measured in the test and extrapolation as the discrimination of the items.…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation