Using Raters From India to Score a Large‐Scale Speaking Test

Xi, Xiaoming; Mollaun, Pam

doi:10.1111/j.1467-9922.2011.00667.x

Cited by 41 publications

(55 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The substantial rater severity/leniency differences among raters, as was also found in some previous research (e.g., Attali, 2016;Bijani & Fahim, 2011;Xi & Mollaun, 2011), have important consequences for decision makers, in that in rater training more attention and importance should be dedicated to withinrater consistency (intra-rater agreement) than to between-rater consistency (interrater agreement).…”

Section: Discussionsupporting

confidence: 57%

Investigating the Effect of Training on Raters’ Bias toward Test Takers in Oral Proficiency Assessment : A FACETS Analysis

Bijani¹,

Khabiri²

2017

The Journal of AsiaTEFL

View full text Add to dashboard Cite

Typically, variability among raters in scoring and their bias is mediated through rater training. However, questions still remain about whether training can affect raters' severity or leniency. Furthermore, few studies have looked at the differences between trained and untrained raters in oral assessment. Oral test scores of 200 test takers rated by 20 raters and were analyzed before and after a training program using the multifaceted Rasch measurement (MFRM). The results demonstrated the constructive impact of training programs in reducing raters' biases and increasing their consistency measures. This study indicated that inexperienced raters benefited more from a training program than experienced raters and thus achieved higher measures of consistency afterward. It also demonstrated a higher biased interaction for test takers on the extreme ends of the oral ability continuum. The findings demonstrated that it is almost impossible to completely eradicate rater variability even through rater training. Therefore, rater training should be viewed as a procedure to establish within-rater consistency rather than between-rater consistency. Since this study showed that inexperienced raters can rate even more reliably than experienced ones after training, there is no evidence whereby decision makers can exclude inexperienced raters solely because of their lack of adequate experience. Consequently, decision makers need to use their budgets for establishing rater training programs for inexperienced raters instead.

show abstract

Section: Discussionsupporting

confidence: 57%

Investigating the Effect of Training on Raters’ Bias toward Test Takers in Oral Proficiency Assessment : A FACETS Analysis

Bijani¹,

Khabiri²

2017

The Journal of AsiaTEFL

View full text Add to dashboard Cite

show abstract

“…Zhang and Elder (2011) compared Chinese native speakers and English native speakers' ratings of Chinese students' oral proficiency in the national College English Test-Spoken English Test (CET-SET) in China. More recently, however, researchers have become more interested in speakers of English varieties from Outer Circle countries such as India (Carey et al, 2011;Hsu, 2012;Xi & Mollaun, 2011). Xi and Mollaun (2011) investigated to what extent certified and trained TOEFL iBT Speaking Test raters from India could rate as consistently and reliably as operational raters from the United States and what effects a special training package, in which raters were only exposed to Indian test takers' responses, had on raters' scores and rater confidence.…”

Section: Introductionmentioning

confidence: 99%

Investigating Differences Between American and Indian Raters in Assessing TOEFL iBT Speaking Tasks

Wei

Llosa

2015

Language Assessment Quarterly

View full text Add to dashboard Cite

This article reports on an investigation of the role raters' language background plays in raters' assessment of test takers' speaking ability. Specifically, this article examines differences between American and Indian raters in their scores and scoring processes when rating Indian test takers' responses to the Test of English as a Foreign Language TM Internet-Based Test (TOEFL iBT ® ) Speaking tasks. Three American and three Indian raters were asked to score 60 speech samples from 10 Indian test takers' responses to TOEFL iBT Speaking tasks and to perform think-aloud protocols while scoring. The data were analyzed with Multifaceted Rasch and verbal protocol analyses. Findings indicate that Indian raters were better than American raters at identifying and understanding features of Indian English in the test takers' responses. However, Indian and American raters did not differ in their use of scoring criteria, their attitudes toward Indian English, or in the internal consistency and severity of the scores.

show abstract

“…Groups of phonetically trained judges and untrained raters display great agreement in their overall ratings (e.g. Bongaerts et al, 1997;Hopp & Schmid, 2013), even though interrater reliability was found to be higher among trained raters in some studies (Thompson, 1991, though see Xi & Mollaun, 2011).…”

Section: Ratersmentioning

confidence: 99%

“…Other studies on holistic oral production assessment find that rater familiarity with the particular language combinations in the speakers affects ratings among raters who are native (e.g. Carey, Mannell & Dunn, 2011;Winke, Gass & Myford, 2012) as well as non-native (Xi & Mollaun, 2011, Zhang & Elder, 2011 speakers of the language to be rated. In addition, several studies report that familiarity with regional accents and dialects that may occur in the speech samples affects ratings of foreign accent (e.g.…”

Section: Ratersmentioning

confidence: 99%

Comparing foreign accent in L1 attrition and L2 acquisition: Range and rater effects

Schmid

Hopp

2014

Language Testing

View full text Add to dashboard Cite

This study examines the methodology of global foreign accent ratings in studies on L2 speech production. In three experiments, we test how variation in raters, ranges of speech samples as well as instructions and procedure affects ratings of native and foreign accents in predominantly monolingual speakers of German, non-native speakers of German as well as long-term emigrants from Germany, i.e. L1 attriters. The findings show that rater differences do not result in systematic changes in rating patterns. In contrast, range effects and effects of familiarity with accented speech lead to shifts in absolute and relative ratings. Including more strongly foreign-accented samples leads to better judgments for the entire group of bilinguals compared to natives. Similarly, lower familiarity with foreign accents results in more variable and more strongly foreign-accented judgments. We discuss the implications for research on L2 pronunciation as well as for the interpretation of nativeness in L2 studies and language testing more generally.

show abstract

Using Raters From India to Score a Large‐Scale Speaking Test

Cited by 41 publications

References 27 publications

Investigating the Effect of Training on Raters’ Bias toward Test Takers in Oral Proficiency Assessment : A FACETS Analysis

Investigating the Effect of Training on Raters’ Bias toward Test Takers in Oral Proficiency Assessment : A FACETS Analysis

Investigating Differences Between American and Indian Raters in Assessing TOEFL iBT Speaking Tasks

Comparing foreign accent in L1 attrition and L2 acquisition: Range and rater effects

Contact Info

Product

Resources

About