2017
DOI: 10.1177/0146621617730390
|View full text |Cite
|
Sign up to set email alerts
|

On the Performance of the Marginal Homogeneity Test to Detect Rater Drift

Abstract: When constructed response items are administered repeatedly, "trend scoring" can be used to test for rater drift. In trend scoring, raters rescore responses from the previous administration. Two simulation studies evaluated the utility of Stuart's measure of marginal homogeneity as a way of evaluating rater drift when monitoring trend scoring. In the first study, data were generated based on trend scoring tables obtained from an operational assessment. The second study tightly controlled table margins to disen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(13 citation statements)
references
References 22 publications
0
13
0
Order By: Relevance
“…For example, a rater's performance may deteriorate over time due to fatigue; that is, they become tired over the course of the scoring project. Other researchers refer to rater drift as changes in rater behavior across test administrations (Park, 2011;Sgammato & Donoghue, 2018). For example, raters might be drawn from a different pool of candidates on every test administration, and on each administration due to multiple factors such as different training personnel, it is not likely that the raters go through exactly the same training as the previous administration.…”
Section: Statement Of the Problemmentioning
confidence: 99%
See 4 more Smart Citations
“…For example, a rater's performance may deteriorate over time due to fatigue; that is, they become tired over the course of the scoring project. Other researchers refer to rater drift as changes in rater behavior across test administrations (Park, 2011;Sgammato & Donoghue, 2018). For example, raters might be drawn from a different pool of candidates on every test administration, and on each administration due to multiple factors such as different training personnel, it is not likely that the raters go through exactly the same training as the previous administration.…”
Section: Statement Of the Problemmentioning
confidence: 99%
“…This study examines four trend-monitoring statistics: paired t-test and Stuart's (1955) Q for marginal homogeneity, and percent of exact agreement and Cohen's (1960) kappa for interrater agreement. The Q statistic is less well-known than the others being used been found to be more powerful than the t-test (Sgammato & Donoghue, 2018) to detect certain types of changes in rater behavior. The purpose of the present study is to examine the ability of these trend-monitoring statistics to detect rater effects in the context of trend scoring.…”
Section: Purpose Of the Studymentioning
confidence: 99%
See 3 more Smart Citations