2008
DOI: 10.1002/j.2333-8504.2008.tb02087.x
|View full text |Cite
|
Sign up to set email alerts
|

ANALYTIC SCORING OF TOEFL® CBT ESSAYS: SCORES FROM HUMANS AND E‐RATER®

Abstract: The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and e‐rater® essay feature variables in the context of the TOEFL® computer‐based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic essay scores provided by human raters and essay feature variable scores computed by e‐rater (version 2.0) for two TOEFL CBT writing prompts. It was found that (a) all of the six … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
29
0

Year Published

2011
2011
2019
2019

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(33 citation statements)
references
References 35 publications
4
29
0
Order By: Relevance
“…This study adds to the growing literature related to the validation and use of e-rater for TOEFL essays (e.g., Attali, 2007Attali, , 2008Attali & Burstein, 2006;Chodorow & Burstein, 2004;Enright & Quinlan, 2008;Lee et al, 2008). In terms of the validity argument for the TOEFL outlined by Chapelle et al (2008), the study provides evidence that support the inferences of generalization (across tasks and raters) and extrapolation to other criteria of writing ability in academic contexts.…”
Section: Implications and Future Directionssupporting
confidence: 58%
See 1 more Smart Citation
“…This study adds to the growing literature related to the validation and use of e-rater for TOEFL essays (e.g., Attali, 2007Attali, , 2008Attali & Burstein, 2006;Chodorow & Burstein, 2004;Enright & Quinlan, 2008;Lee et al, 2008). In terms of the validity argument for the TOEFL outlined by Chapelle et al (2008), the study provides evidence that support the inferences of generalization (across tasks and raters) and extrapolation to other criteria of writing ability in academic contexts.…”
Section: Implications and Future Directionssupporting
confidence: 58%
“…Another approach is to examine relationships between automated scores and external measures of the same ability (i.e., criterion-related validity evidence). The third approach is to investigate the scoring process and mental models represented by automated scoring systems (see for example Attali & Burstein, 2006;Ben-Simon & Bennett, 2007;and Lee, Gentile, & Kantor, 2008 for examples of this line of research). The current study focuses on the first two of these approaches.…”
Section: Validity Of Automated Scoring Systemsmentioning
confidence: 99%
“…However, the results for both conditions show that the correlations within a specific task and across aspects are typically high, while correlations within aspects and across tasks are relatively low. These results are in line with previous research reporting covariance of text quality traits (De Glopper, 1985;Van den Bergh, 1988;Godshalk et al, 1966;Lee, Gentile & Kantor, 2008;McNamara, 1990) and indicate that, based on these scores, both differentiation between different traits within writing ability and generalisation across different tasks are problematic. However, as Deane and Quinlan (2010) point out, the separate traits of writing do reflect the specific targets of writing instruction, justifying a differentiated report on writing ability.…”
Section: Construct Validitysupporting
confidence: 92%
“…Holistic scoring methods assess the overall quality of an essay by considering multiple criteria simultaneously in order to assign a single score. In contrast, trait-based scoring methods [10,8] can provide multiple scores, as they separately consider component parts or writing purposes when scoring an essay. While holistic methods are typically more efficient and provide more reliable scores, trait-based methods are better at providing diagnostic insight on student performance [16,2].…”
Section: Related Workmentioning
confidence: 99%
“…In terms of writing tasks, most systems (whether holistic or trait-based) focus on assessing writing in response to open-ended prompts [1,13,7,10,5] rather than in response to text. In contrast to the RTA, available assessments tend not to directly measure complex writing skills in which critical thinking and reading are deeply embedded [6,5].…”
Section: Related Workmentioning
confidence: 99%