Application of Latent Trait Models to Identifying Substantively Interesting Raters

Wolfe, Edward W.; McVay, Aaron

doi:10.1111/j.1745-3992.2012.00241.x

Cited by 67 publications

(88 citation statements)

References 20 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Previously, Wolfe and McVay (2012) criticized much of the extant research regarding rater effects because it failed to jointly consider quantitatively-precise definitions and measures of rating quality with substantive variables that have been thoughtfully selected based on detailed models of the content (in our case, writing). That is, our study demonstrates one way that traditional psychometric methods can be coupled with automated scoring technologies and the associated deep substantive knowledge base upon which those technologies are based to produce results that are important from both quantitative and substantive perspectives.…”

Section: Discussionmentioning

confidence: 98%

Features of difficult-to-score essays

2016

Self Cite

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 98%

Features of difficult-to-score essays

2016

Self Cite

View full text Add to dashboard Cite

“…In short, prior to the act of scoring itself, much preparation takes place, including the training of the raters that ideally leads each rater to a common mental rubric. In practice, a variety of factor can affect the mental rubric such that the scoring behavior of raters is far from identical and can lead to a number of rater effects Such rater effects are the overt manifestation of lack of exchangeability among raters and can be detected with the appropriate psychometric tools (see Wolfe, 2012).…”

Section: Preamble To Scoring: Assessment Design and Scorer Trainingmentioning

confidence: 99%

Rater Cognition: Implications for Validity

Bejar

2012

Educational Measurement

View full text Add to dashboard Cite

The scoring process is critical in the validation of tests that rely on constructed responses. Documenting that readers carry out the scoring in ways consistent with the construct and measurement goals is an important aspect of score validity. In this article, rater cognition is approached as a source of support for a validity argument for scores based on constructed responses, whether such scores are to be used on their own or as the basis for other scoring processes, for example, automated scoring.

show abstract

“…For example, assessment instruments that are better aligned with content as well as the target context have been shown to increase reliability; however, these do not appear to adequately address examiner variation (Crossley et al 2011). Examiner training has also been explored as a means of supporting examiners and increasing reliability (e.g., Green and Holmboe 2010;Wolfe and McVay 2012), but the effectiveness of such training has been inconsistent. Finally, frame of reference training has also been suggested as a means of increasing the reliability of scores and the validity of inferences with performance assessments (Kogan et al 2011), though there remain debates as to what constitutes an appropriate frame of reference.…”

mentioning

confidence: 96%

Examiners and content and site: Oh My! A national organization’s investigation of score variation in large-scale performance assessments

Sebok

Roy

Klinger

et al. 2014

Adv in Health Sci Educ

View full text Add to dashboard Cite

Examiner effects and content specificity are two well known sources of construct irrelevant variance that present great challenges in performance-based assessments. National medical organizations that are responsible for large-scale performance based assessments experience an additional challenge as they are responsible for administering qualification examinations to physician candidates at several locations and institutions. This study explores the impact of site location as a source of score variation in a large-scale national assessment used to measure the readiness of internationally educated physician candidates for residency programs. Data from the Medical Council of Canada's National Assessment Collaboration were analyzed using Hierarchical Linear Modeling and Rasch Analyses. Consistent with previous research, problematic variance due to examiner effects and content specificity was found. Additionally, site location was also identified as a potential source of construct irrelevant variance in examination scores.

show abstract

Application of Latent Trait Models to Identifying Substantively Interesting Raters

Cited by 67 publications

References 20 publications

Features of difficult-to-score essays

Features of difficult-to-score essays

Rater Cognition: Implications for Validity

Examiners and content and site: Oh My! A national organization’s investigation of score variation in large-scale performance assessments

Contact Info

Product

Resources

About