Making students' marks fair: standard setting, assessment items and post hoc item analysis

Tavakol, Mohsen; Doody, Gillian A.

doi:10.5116/ijme.54e8.86df

Cited by 5 publications

(5 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the method could also be in high-stakes testing to correct values for individual items where the Facility is very different from the standard-set value, and its use is planned in this context for the UK General Medical Council's proposed national Medical Licensing Assessment [6], where it will be compared with Item Response Theory methods [7]. At the moment, items which perform very differently from their predicted Angoff value, may be removed from the assessment, and candidates may therefore lose or gain a score point, depending on which approach is taken [8] (and neither of which is particularly satisfactory).…”

Section: Discussionmentioning

confidence: 99%

Quantifying the Borderline Candidate in Standard Setting

McLachlan¹,

Robertson²,

Weller³

et al. 2021

EDULEARN Proceedings

View full text Add to dashboard Cite

Background:Conceptualising the Borderline candidate is one of the most difficult tasks in standard setting. However, it is also central to the process. Here we describe a methodology by which the score of Borderline candidates can be retrospectively calculated from the Facility (the percentage of items answered correctly) of assessment items for the cohort as a whole. Methods:We previously explored performance of candidates within an academic year in one UK medical school, covering 26 separate assessments. Each assessment had previously been standard set by either Angoff or Borderline Regression methods. In this study, we identified Borderline candidates by reviewing their performance within a particular test, not part of the previously published material. A student was classed as 'Borderline' if they were within 1 Standard Error of Measurement above or below the pass cut score. We plotted the item scores of the Borderline candidates as calculated by this method in comparison with Facility for the whole cohort and fitted a curve to the resulting distribution. In this paper, a simple method of repeating this process is described for any cohort of students. Results:For an ideal cohort of candidates, Borderline candidate scores should intercept the self-plot of all candidate scores at two places -at a facility of 100% and a facility of 20%. These correspond to all candidates getting the item correct and all candidates guessing the outcome. We observed a strong curvilinear distribution showed by Borderline candidates compared to the whole cohort. This relationship was well described by an exponential of the form y ≈ C•exp(F•x), where y is the Facility of Borderline candidates on that Item, x is the observed Item Facility of the whole cohort, and C and F are constants.In our previous study we had found C and F had similar values under different conditions. Ideal values for C and F of 12.3 and 0.021, intercept the self-plot of item Facilities very close to 100% and 20%. In this study, we again observed values for C and F close to these ideal values: C = 10.06 and F = 0.0231. Differentiating the equation indicates where the assessment ought to be most sensitive.Differentiating the ideal curve of the difference between all candidates and Borderline candidates suggests an item facility at which the sensitivity of discrimination between the cohort and the borderline candidates is at a maximum. This value is approximately 64.5%. Conclusions:This approach can be used to standard-set assessments in their entirety when they are low stakes or norm referenced, in preference to Cohen methods. While Cohen methods are based on the performance of one candidate (or a very small number of candidates), this exponential method is based on all candidates and all items and is therefore more robust. In high stakes assessments, it can be used to correct values where the Facility is very different from the standard-set value, and its use in this context for the UK General Medical Council proposed national exam. It could also be used to stand...

show abstract

Section: Discussionmentioning

confidence: 99%

Quantifying the Borderline Candidate in Standard Setting

McLachlan¹,

Robertson²,

Weller³

et al. 2021

EDULEARN Proceedings

View full text Add to dashboard Cite

show abstract

“…The assurance of sufficient quality and robust standard setting is central to the delivery of any successful competency-based assessment 26 . One of the most challenging aspects of clinical assessment is making pass/fail decisions for borderline grades allocated by examiners without adequate information to make these decisions 27 .…”

Section: Discussionmentioning

confidence: 99%

A competency-based approach to pass/fail decisions in an objective structured clinical examination: An observational study

Alkhateeb

Al-Dabbagh

Mohammed

et al. 2020

Preprint

View full text Add to dashboard Cite

Background: Any high-stakes assessment that leads to an important decision requires careful consideration in determining whether a student passes or fails. This observational study conducted in Erbil, Iraq, in June 2018 proposes a defensible pass/fail decision based on the number of failed competencies. Methods: Results were obtained for 150 medical students on their final objective structured clinical examination. Cutoff scores and pass/fail decisions were calculated using the modified Angoff, borderline, borderline-regression and holistic methods. The results were compared with each other and with a new competency method using Cohen s kappa. Rasch analysis was used to compare the consistency of competency data with Rasch model estimates. Results: The competency method resulted in 40 (26.7%) students failing, compared with 76 (50.6%), 37 (24.6%), 35 (23.3%) and 13 (8%) for the modified Angoff, borderline, borderline regression and holistic methods, respectively. The competency method demonstrated a sufficient degree of fit to the Rasch model (mean outfit and infit statistics of 0.961 and 0.960, respectively). Conclusions: the competency method was more stringent in determining pass/fail, compared with other standard-setting methods, except for the modified Angoff method. The fit of competency data to the Rasch model provides evidence for the validity and reliability of pass/fail decisions.

show abstract

“…One well-established perspective is, undeniably, how the practice of assessment is typically one of the teachers' least favorite topics and tasks when it comes to their professions. Conversely, students also do not typically fully appreciate assessment and often feel uneasy about elements of the assessment process (Segers & Tillema, 2011;Tavakol & Doody, 2015). While some teachers embrace a more traditional form of classroom writing assessment (e.g., the task of essay writing), Kubiszyn and Borich (2013) proposed a more contemporary approach to classroom writing assessment practice (e.g., group writing assessment activities, research activities, and portfolio-based exercises).…”

Section: Literature Reviewmentioning

confidence: 99%

The Practice of Cross-Grading in Assessing Writing: The Case of EFL Teachers and Students in a Saudi Arabian Context

Alshakhi¹

2021

IJEL

View full text Add to dashboard Cite

This qualitatively based research study utilized a combination of multiple methods, which aimed at investigating the efficacy and reliability of employing cross-grading when assessing English as a Foreign Language (EFL) tertiary level learners’ writing. It further explored the perceptions of the EFL teachers and learners regarding the cross-grading practices to provide a clearer understanding of this relatively unexplored line of research enquiry. It was set to answer the following research question: In what ways does cross-grading practice contribute to assessing EFL writing? The participants of this study were conveniently selected where the sample included four language instructors from different ethnic and cultural backgrounds, as well as four Saudi EFL learners. Semi-structured interviews were individually conducted with all eight participants. In addition, four one-on-one feedback sessions between language instructors and learners were observed to assess feedback effectiveness after the cross-grading sessions. The data analysis revealed that instructors had difficulty explaining the feedback on their learners’ papers since they did not grade their students’ papers themselves. Furthermore, students felt they did not benefit from the feedback sessions because they could not fully understand the external grader’s markings and, thus inhibiting the learner’s ability to improve and develop their writing. The study concluded with some pedagogical implications for the EFL writing assessment context.

show abstract

Making students' marks fair: standard setting, assessment items and post hoc item analysis

Cited by 5 publications

References 4 publications

Quantifying the Borderline Candidate in Standard Setting

Quantifying the Borderline Candidate in Standard Setting

A competency-based approach to pass/fail decisions in an objective structured clinical examination: An observational study

The Practice of Cross-Grading in Assessing Writing: The Case of EFL Teachers and Students in a Saudi Arabian Context

Contact Info

Product

Resources

About