2015
DOI: 10.1007/978-3-319-19773-9_38
|View full text |Cite
|
Sign up to set email alerts
|

Distractor Quality Evaluation in Multiple Choice Questions

Abstract: Abstract. Multiple choice questions represent a widely used evaluation mode; yet writing items that properly evaluate student learning is a complex task. Guidelines were developed for manual item creation, but automatic item quality evaluation would constitute a helpful tool for teachers. In this paper, we present a method for evaluating distractor (i.e. incorrect option) quality that combines syntactic and semantic homogeneity criteria, based on Natural Language Processing methods. We perform an evaluation of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…Previous automatic distractor assessment methods proposed to compare the similarity of generated distractors with the ground-truth distractors present in the dataset (Gao et al, 2019) or consider rule-based approaches (Pho et al, 2015). Following standard reference-based evaluation, n-gram overlap metrics such as BLEU (Papineni et al, 2002), ROUGE (Lin, 2004) and METEOR (Banerjee and Lavie, 2005) have been considered, where these metrics measure the overlap between generated distractors and the distractors from a set of human-annotated ground truth sequences.…”
Section: Related Workmentioning
confidence: 99%
“…Previous automatic distractor assessment methods proposed to compare the similarity of generated distractors with the ground-truth distractors present in the dataset (Gao et al, 2019) or consider rule-based approaches (Pho et al, 2015). Following standard reference-based evaluation, n-gram overlap metrics such as BLEU (Papineni et al, 2002), ROUGE (Lin, 2004) and METEOR (Banerjee and Lavie, 2005) have been considered, where these metrics measure the overlap between generated distractors and the distractors from a set of human-annotated ground truth sequences.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, a lot of examinees can be assessed simultaneously and their answers can be graded by a computer system (Epstein, 2007). MCQs are also viewed as time efficient, easy to grade, and as long as they are well written, they can be an objective, trustworthy, and adequate means of assessment with a potential to evaluate also higher levels of thinking Epstein, 2007;Pho et al, 2015;Tarrant & Ware, 2012).…”
Section: Introductionmentioning
confidence: 99%