Carol M. Myford scite author profile

Based on evidence that listeners may favor certain foreign accents over others (Gass & Varonis, 1984; Major, Fitzmaurice, Bunta, & Balasubramanian, 2002; Tauroza & Luk, 1997) and that language-test raters may better comprehend and/or rate the speech of test takers whose native languages (L1s) are more familiar on some level (Carey, Mannell, & Dunn, 2011; Fayer & Krasinski, 1987; Scales, Wennerstrom, Richard, & Wu, 2006), we investigated whether accent familiarity (defined as having learned the test takers’ L1) leads to rater bias. We examined 107 raters’ ratings on 432 TOEFL iBTTM speech samples from 72 test takers. The raters of interest were L2 speakers of Spanish, Chinese, or Korean, while the test takers comprised three native-speaker groups (24 each) of Spanish, Chinese, and Korean. We analyzed the ratings using a multifaceted Rasch measurement approach. Results indicated that L2 Spanish raters were significantly more lenient with L1 Spanish test takers, as were L2 Chinese raters with L1 Chinese test takers. We conclude by concurring with Xi and Mollaun (2009, 2011) and Carey et al. that rater training should address raters’ linguistic background as a potential rater effect. Furthermore, we discuss the importance of recognizing rater L2 as a possible source of bias.

show abstract

Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use

Myford

Wolfe

2009

J Educational Measurement

View full text Add to dashboard Cite

In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition examination, employing a multifaceted Rasch approach to determine whether raters exhibited evidence of two types of differential rater functioning over time (i.e., changes in levels of accuracy or scale category use). Some raters showed statistically significant changes in their levels of accuracy as the scoring progressed, while other raters displayed evidence of differential scale category use over time.

show abstract

Monitoring Faculty Consultant Performance in the Advanced Placement English Literature and Composition Program With a Many‐faceted Rasch Model

Engelhard

Myford

2003

ETS Research Report Series

View full text Add to dashboard Cite

The purpose of this study was to examine, describe, evaluate, and compare the rating behavior of faculty consultants who scored essays written for the Advanced Placement English Literature and Composition (AP® ELC) Exam. Data from the 1999 AP ELC Exam were analyzed using FACETS (Linacre, 1998) and SAS. The faculty consultants were not all interchangeable in terms of the level of severity they exercised. If students' ratings had been adjusted for severity differences, the AP grades of about 30 percent of the students would have been different from the one they received. Almost all the differences were one grade or less. Adjusting ratings for faculty consultant severity differences would not impact some student subgroups more than others.

show abstract

Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement

Iramaneerat

Yudkowsky

Myford

et al. 2007

Adv in Health Sci Educ

View full text Add to dashboard Cite

An Objective Structured Clinical Examination (OSCE) is an effective method for evaluating competencies. However, scores obtained from an OSCE are vulnerable to many potential measurement errors that cases, items, or standardized patients (SPs) can introduce. Monitoring these sources of errors is an important quality control mechanism to ensure valid interpretations of the scores. We describe how one can use generalizability theory (GT) and many-faceted Rasch measurement (MFRM) approaches in quality control monitoring of an OSCE. We examined the communication skills OSCE of 79 residents from one Midwestern university in the United States. Each resident performed six communication tasks with SPs, who rated the performance of each resident using 18 5-category rating scale items. We analyzed their ratings with generalizability and MFRM studies. The generalizability study revealed that the largest source of error variance besides the residual error variance was SPs/cases. The MFRM study identified specific SPs/cases and items that introduced measurement errors and suggested the nature of the errors. SPs/cases were significantly different in their levels of severity/difficulty. Two SPs gave inconsistent ratings, which suggested problems related to the ways they portrayed the case, their understanding of the rating scale, and/or the case content. SPs interpreted two of the items inconsistently, and the rating scales for two items did not function as 5-category scales. We concluded that generalizability and MFRM analyses provided useful complementary information for monitoring and improving the quality of an OSCE.

show abstract

Improving Student Selection Using Multiple Mini-Interviews With Multifaceted Rasch Modeling

2013

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Carol M. Myford

Raters’ L2 background as a potential source of bias in rating oral performance

Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use

Monitoring Faculty Consultant Performance in the Advanced Placement English Literature and Composition Program With a Many‐faceted Rasch Model

Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement

Improving Student Selection Using Multiple Mini-Interviews With Multifaceted Rasch Modeling

Contact Info

Product

Resources

About