Distractor Quality Evaluation in Multiple Choice Questions

Pho, Van-Minh; Ligozat, Anne-Laure; Grau, Brigitte

doi:10.1007/978-3-319-19773-9_38

Cited by 3 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous automatic distractor assessment methods proposed to compare the similarity of generated distractors with the ground-truth distractors present in the dataset (Gao et al, 2019) or consider rule-based approaches (Pho et al, 2015). Following standard reference-based evaluation, n-gram overlap metrics such as BLEU (Papineni et al, 2002), ROUGE (Lin, 2004) and METEOR (Banerjee and Lavie, 2005) have been considered, where these metrics measure the overlap between generated distractors and the distractors from a set of human-annotated ground truth sequences.…”

Section: Related Workmentioning

confidence: 99%

Assessing Distractors in Multiple-Choice Tests

Raina,

Liusie,

Gales

2023

Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems

View full text Add to dashboard Cite

Multiple-choice tests are a common approach for assessing candidates' comprehension skills. Standard multiple-choice reading comprehension exams require candidates to select the correct answer option from a discrete set based on a question in relation to a contextual passage. For appropriate assessment, the distractor answer options must by definition be incorrect but plausible and diverse. However, generating good quality distractors satisfying these criteria is a challenging task for content creators. We propose automated assessment metrics for the quality of distractors in multiple-choice reading comprehension tests. Specifically, we define quality in terms of the incorrectness, plausibility and diversity of the distractor options. We assess incorrectness using the classification ability of a binary multiple-choice reading comprehension system. Plausibility is assessed by considering the distractor confidence -the probability mass associated with the distractor options for a standard multi-class multiplechoice reading comprehension system. Diversity is assessed by pairwise comparison of an embedding-based equivalence metric between the distractors of a question. To further validate the plausibility metric we compare against candidate distributions over multiple-choice questions and agreement with a ChatGPT model's interpretation of distractor plausibility and diversity.

show abstract

Section: Related Workmentioning

confidence: 99%

Assessing Distractors in Multiple-Choice Tests

Raina,

Liusie,

Gales

2023

Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems

View full text Add to dashboard Cite

show abstract

“…Moreover, a lot of examinees can be assessed simultaneously and their answers can be graded by a computer system (Epstein, 2007). MCQs are also viewed as time efficient, easy to grade, and as long as they are well written, they can be an objective, trustworthy, and adequate means of assessment with a potential to evaluate also higher levels of thinking Epstein, 2007;Pho et al, 2015;Tarrant & Ware, 2012).…”

Section: Introductionmentioning

confidence: 99%

Guidelines on Writing Multiple Choice Questions: A Well-Received and Effective Faculty Development Intervention

et al. 2020

View full text Add to dashboard Cite

Multiple choice questions (MCQs) are commonly used for assessing students, but medical teachers may lack training in writing them. MCQs often have imperfections called item-writing flaws (IWFs) that can affect students’ results and impede objective evaluation of their knowledge. This study aimed to evaluate a guideline-based faculty development intervention on writing MCQs at Levels 1 and 2 of the Kirkpatrick Model. MCQs written by teachers prior and after the intervention were analyzed with the Shapiro–Wilk test, Student’s t-test, and Wilcoxon signed-rank test as appropriate. In addition, the phenomenological approach was chosen to describe experiences of 10 teachers in semi-structured in-depth interviews. Results showed satisfaction of teachers with the document. They found it helpful in writing MCQs and noticed the requirement for another guidelines. They appreciated it for its briefness and clarity. The statistical analysis of the quality of MCQs before and after the intervention showed that the document contributed to a statistically significant reduction of IWFs. To conclude, our results showed that a guideline document on writing MCQs may serve as a well-received and effective faculty development intervention. The document may be used as a flexible, time-saving, and just-in-time learning method, fitting needs of medical teachers.

show abstract

Evaluation of auto-generated distractors in multiple choice questions from a semantic network

Zhang

VanLehn

2019

Interactive Learning Environments

View full text Add to dashboard Cite

Distractor Quality Evaluation in Multiple Choice Questions

Cited by 3 publications

References 12 publications

Assessing Distractors in Multiple-Choice Tests

Assessing Distractors in Multiple-Choice Tests

Guidelines on Writing Multiple Choice Questions: A Well-Received and Effective Faculty Development Intervention

Evaluation of auto-generated distractors in multiple choice questions from a semantic network

Contact Info

Product

Resources

About