On the Performance of the Marginal Homogeneity Test to Detect Rater Drift

Sgammato, Adrienne; Donoghue, John R.

doi:10.1177/0146621617730390

Cited by 4 publications

(13 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, a rater's performance may deteriorate over time due to fatigue; that is, they become tired over the course of the scoring project. Other researchers refer to rater drift as changes in rater behavior across test administrations (Park, 2011;Sgammato & Donoghue, 2018). For example, raters might be drawn from a different pool of candidates on every test administration, and on each administration due to multiple factors such as different training personnel, it is not likely that the raters go through exactly the same training as the previous administration.…”

Section: Statement Of the Problemmentioning

confidence: 99%

“…This study examines four trend-monitoring statistics: paired t-test and Stuart's (1955) Q for marginal homogeneity, and percent of exact agreement and Cohen's (1960) kappa for interrater agreement. The Q statistic is less well-known than the others being used been found to be more powerful than the t-test (Sgammato & Donoghue, 2018) to detect certain types of changes in rater behavior. The purpose of the present study is to examine the ability of these trend-monitoring statistics to detect rater effects in the context of trend scoring.…”

Section: Purpose Of the Studymentioning

confidence: 99%

“…Examples include the many-faceted Rasch model (Linacre, 1989); FACETS model (Lunz, Wright , & Linacre, 1990); an IRT model for multiple raters (Verhelst & Verstralen, 2001); the rater bundle model ; the hierarchical rater model (Patz, Junker, Johnson, & Mariano, 2002) and its signal detection theory version (DeCarlo, 2010;DeCarlo, Kim, and Johnson, 2011); and Yao's rater model (Wang & Yao, 2013). These models are most useful when all the CR items of an assessment have been scored and merged with the multiple choice items (Sgammato & Donoghue, 2018). However in some testing programs, a scoring team consists of a group of 10 to 12 raters who are led by a supervisor and a trainer.…”

Section: Subgroup/feature Biasesmentioning

confidence: 99%

“…Within-assessment reliability could be estimated by having some responses scored a second time by a different rater. The most often used statistics to evaluate agreement among raters at one particular time are percentage of exact agreement, correlations, Cohen's (1960) kappa, and the intraclass correlation (Sgammato & Donoghue, 2018).…”

Section: Trend Scoringmentioning

confidence: 99%

“…In an article that considers the properties of different interrater agreement statistics, Zwick (1988) recommends that assessment of agreement should consist of inspecting marginal homogeneity, specifically with Stuart's Q, and if that holds, then proceed with a measure of agreement. Sgammato and Donoghue (2018) studied two measures related to marginal homogeneity: the paired t-test and Stuart's (1955) Q statistic. They manipulated the sample size, number of score categories, interrater agreement, and symmetry/asymmetry of the score margins.…”

Section: Trend Scoringmentioning

confidence: 99%

See 4 more Smart Citations

Detecting rater effects in trend scoring

Abdalla¹

View full text Add to dashboard Cite

All Rights Reservedii Acknowledgements Of all the pages I had to write for this dissertation, these are the pages that I was most excited to write because I have so much I'm grateful for. First and foremost, to God: thank you for giving me so much ease during such a difficult process and for giving me so many blessings I can't even count. One of those blessings being all the amazing people you've put along my path.

show abstract

Section: Statement Of the Problemmentioning

confidence: 99%

Section: Purpose Of the Studymentioning

confidence: 99%

Section: Subgroup/feature Biasesmentioning

confidence: 99%

Section: Trend Scoringmentioning

confidence: 99%

Section: Trend Scoringmentioning

confidence: 99%

See 3 more Smart Citations

Detecting rater effects in trend scoring

Abdalla¹

View full text Add to dashboard Cite

show abstract

Measuring and Visualizing Coders’ Reliability: New Approaches and Guidelines From Experimental Data

Lamprianou

2020

Sociological Methods & Research

View full text Add to dashboard Cite

This study investigates inter- and intracoder reliability, proposing a new approach based on social network analysis (SNA) and exponential random graph models (ERGM). During a recent exit poll, the responses of voters to two open-ended questions were recorded. A coding experiment was conducted where a group of coders coded a sample of text segments. Analyzing the data, we show that the proposed SNA/ERGM method extends significantly our analytical leverage, beyond what popular tools such as Krippendorff’s α and Fleiss’s κ have to offer. The reliability of coding for individual coders differed significantly for the two questions although they were very similar and the same codebook was used. We conclude that the main advantages of the proposed SNA/ERGM method are the intuitive visualizations and the nuanced measurements. Detailed guidelines are provided for practitioners who would like to use the proposed method in operational settings.

show abstract

The Teacher’s Invisible Hand: A Meta-Analysis of the Relevance of Teacher–Student Relationship Quality for Peer Relationships and the Contribution of Student Behavior

Endedijk¹,

Breeman

Lissa

et al. 2021

Review of Educational Research

View full text Add to dashboard Cite

The relationships that students have with teachers and peers are important for their academic, social, and behavioral development. How teachers relate to students may affect students’ peer relationships and thereby foster or hamper students’ development. To shed more light on the teacher’s role with respect to peer relationships, this meta-analysis assessed the association between the quality of teacher–student and peer relationships (n = 297 studies; n = 1,475 unique effect sizes). We took student behavior into account, as it is known to affect both types of relationship. In addition, design characteristics such as positive versus negative aspects of relationships, type of informants, and educational level were considered. Results showed that negative aspects of the teacher–student relationship in particular were predictive of peer relationships. Moreover, teacher–student relationship quality partially mediated the association between student behavior and peer relationships. For teachers, preventing or reducing negative aspects in their relationships with students who have behavioral problems can positively affect classroom peer relationships.

show abstract

On the Performance of the Marginal Homogeneity Test to Detect Rater Drift

Cited by 4 publications

References 22 publications

Detecting rater effects in trend scoring

Detecting rater effects in trend scoring

Measuring and Visualizing Coders’ Reliability: New Approaches and Guidelines From Experimental Data

The Teacher’s Invisible Hand: A Meta-Analysis of the Relevance of Teacher–Student Relationship Quality for Peer Relationships and the Contribution of Student Behavior

Contact Info

Product

Resources

About