2023
DOI: 10.1177/23821205231204178
|View full text |Cite
|
Sign up to set email alerts
|

Examining the Threat of ChatGPT to the Validity of Short Answer Assessments in an Undergraduate Medical Program

Leo Morjaria,
Levi Burns,
Keyna Bracken
et al.

Abstract: OBJECTIVES ChatGPT is an artificial intelligence model that can interpret free-text prompts and return detailed, human-like responses across a wide domain of subjects. This study evaluated the extent of the threat posed by ChatGPT to the validity of short-answer assessment problems used to examine pre-clerkship medical students in our undergraduate medical education program. METHODS Forty problems used in prior student assessments were retrieved and stratified by levels of Bloom's Taxonomy. Thirty of these pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…The frequency of changes between scoring categories (34-57%) suggests that relying solely on AI-based grading could sometimes overlook nuances that would be critical in a medical educational context. However, while not formally tested in this study, there also exists some level of inter-rater variability with independent human tutors; our group formally investigated in a previous work and found a Cronbach alpha value of 0.816 for a team of six human assessors on past student-generated CAE responses [29].…”
Section: Discussionmentioning
confidence: 85%
“…The frequency of changes between scoring categories (34-57%) suggests that relying solely on AI-based grading could sometimes overlook nuances that would be critical in a medical educational context. However, while not formally tested in this study, there also exists some level of inter-rater variability with independent human tutors; our group formally investigated in a previous work and found a Cronbach alpha value of 0.816 for a team of six human assessors on past student-generated CAE responses [29].…”
Section: Discussionmentioning
confidence: 85%
“…This poses comprehension challenges for students with low abilities (Guo & Wang, 2023). In addition, some teachers noticed that ChatGPT might use different evaluation criteria from their own, and its lack of specific knowledge about the class and students could lead to inappropriate feedback (Morjaria et al, 2023). These limitations indicated that although ChatGPT seemed to be powerful, it could not replace teacher feedback.…”
Section: Teacher Beliefs About Assessmentmentioning
confidence: 99%
“…For example, GenAI can be used to advance writing such as proofreading, critique, and editing (Currie et al, 2023), create personalized assessments, and simulate conversations (Cheung et al, 2023;Currie et al, 2023). Therefore, the teachers were encouraged to use alternative grading practices, like incorporating nontraditional, authentic assessments that are difficult for AI to replicate without prompting (Chaudhry et al, 2023;Fuchs et al, 2023;Morjaria et al, 2023;Overono & Ditta, 2023;Perkins, 2023). In other words, the students' language can be self-assessed through GenAI, while the students' ideas and logical thinking can be assessed by teachers.…”
Section: Balancing Genai and Human Assessmentmentioning
confidence: 99%
See 1 more Smart Citation
“…ChatGPT has been compared to human raters in terms of grading short-answer pre-clerkship medical questions. The ChatGPT-human Spearman correlations for a single assessor ranged from 0.6 to 0.7 [12].…”
Section: Introductionmentioning
confidence: 99%