ANALYTIC SCORING OF TOEFL® CBT ESSAYS: SCORES FROM HUMANS AND E‐RATER®

Lee, Yong‐Won; Gentile, Claudia; Kantor, Robert

doi:10.1002/j.2333-8504.2008.tb02087.x

Cited by 28 publications

(33 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This study adds to the growing literature related to the validation and use of e-rater for TOEFL essays (e.g., Attali, 2007Attali, , 2008Attali & Burstein, 2006;Chodorow & Burstein, 2004;Enright & Quinlan, 2008;Lee et al, 2008). In terms of the validity argument for the TOEFL outlined by Chapelle et al (2008), the study provides evidence that support the inferences of generalization (across tasks and raters) and extrapolation to other criteria of writing ability in academic contexts.…”

Section: Implications and Future Directionssupporting

confidence: 58%

“…Another approach is to examine relationships between automated scores and external measures of the same ability (i.e., criterion-related validity evidence). The third approach is to investigate the scoring process and mental models represented by automated scoring systems (see for example Attali & Burstein, 2006;Ben-Simon & Bennett, 2007;and Lee, Gentile, & Kantor, 2008 for examples of this line of research). The current study focuses on the first two of these approaches.…”

Section: Validity Of Automated Scoring Systemsmentioning

confidence: 99%

See 1 more Smart Citation

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

Weigle

2011

ETS Research Report Series

View full text Add to dashboard Cite

ETS is an Equal Opportunity/Affirmative Action Employer.As part of its educational and social mission and in fulfilling the organization's non-profit Charter and Bylaws, ETS has and continues to learn from and also to lead research that furthers educational and measurement research to advance quality and equity in education and assessment for all users of the organization's products and services.Copyright © 2011 by ETS. All rights reserved.No part of this report may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Violators will be prosecuted in accordance with both U.S. and international copyright laws.CRITERION, E-RATER, ETS, the ETS logos, GRADUATE RECORD EXAMINATIONS, GRE, LISTENING. LEARNING. LEADING., TOEFL, TOEFL iBT, the TOEFL logo, and TWE are registered trademarks of Educational Testing Service (ETS).COLLEGE BOARD and SAT are registered trademarks of the College Entrance Examination Board.i Abstract Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders. This study addresses two validityrelated issues regarding the use of e-rater ® with the independent writing task on the TOEFL iBT ® (Internet-based test). First, relationships between automated scores of iBT tasks and nontest indicators of writing ability were examined. This was followed by exploration of prompt-related differences in automated scores of essays written by the same examinees. Correlations between both human and e-rater scores and nontest indicators were moderate but consistent, with few differences between e-rater and human rater scores. E-rater was more consistent across prompts than individual human raters, although there were differences in scores across prompts for the individual features used to generate total e-rater scores. ETS administers the TOEFL program under the general direction of a policy board that was established by, and is affiliated with, the sponsoring organizations. Members of the TOEFL Board (previously the Policy Council) represent the College Board, the GRE Board, and such institutions and agencies as graduate schools of business, two-year colleges, and nonprofit educational exchange agencies.   Since its inception in 1963, the TOEFL has evolved from a paper-based test to a computer-based test and, in 2005, to an Internet-based test, TOEFL iBT ® . One constant throughout this evolution has been a continuing program of research related to the TOEFL test. From 1977 to 2005, nearly 100 research and technical reports on the early versions of TOEFL were published. In 1997, a monograph series that laid the groundwork for the development of TOEFL iBT was launched. With the release of TOEFL iBT, a TOEFL iBT report series has been introduced.Currently this r...

show abstract

Section: Implications and Future Directionssupporting

confidence: 58%

Section: Validity Of Automated Scoring Systemsmentioning

confidence: 99%

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

Weigle

2011

ETS Research Report Series

View full text Add to dashboard Cite

show abstract

“…However, the results for both conditions show that the correlations within a specific task and across aspects are typically high, while correlations within aspects and across tasks are relatively low. These results are in line with previous research reporting covariance of text quality traits (De Glopper, 1985;Van den Bergh, 1988;Godshalk et al, 1966;Lee, Gentile & Kantor, 2008;McNamara, 1990) and indicate that, based on these scores, both differentiation between different traits within writing ability and generalisation across different tasks are problematic. However, as Deane and Quinlan (2010) point out, the separate traits of writing do reflect the specific targets of writing instruction, justifying a differentiated report on writing ability.…”

Section: Construct Validitysupporting

confidence: 92%

Assessing writing ability in primary education : on the evaluation of text quality and text complexity

Feenstra¹

View full text Add to dashboard Cite

“…Holistic scoring methods assess the overall quality of an essay by considering multiple criteria simultaneously in order to assign a single score. In contrast, trait-based scoring methods [10,8] can provide multiple scores, as they separately consider component parts or writing purposes when scoring an essay. While holistic methods are typically more efficient and provide more reliable scores, trait-based methods are better at providing diagnostic insight on student performance [16,2].…”

Section: Related Workmentioning

confidence: 99%

“…In terms of writing tasks, most systems (whether holistic or trait-based) focus on assessing writing in response to open-ended prompts [1,13,7,10,5] rather than in response to text. In contrast to the RTA, available assessments tend not to directly measure complex writing skills in which critical thinking and reading are deeply embedded [6,5].…”

Section: Related Workmentioning

confidence: 99%

Automatic Scoring of an Analytical Response-To-Text Assessment

Rahimi

Litman

Correnti

et al. 2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. In analytical writing in response to text, students read a complex text and adopt an analytic stance in their writing about it. To evaluate this type of writing at scale, an automated approach for Response to Text Assessment (RTA) is needed. With the long-term goal of producing informative feedback for students and teachers, we design a new set of interpretable features that operationalize the Evidence rubric of RTA. When evaluated on a corpus of essays written by students in grades 4-6, our results show that our features outperform baselines based on well-performing features from other types of essay assessments.

show abstract

ANALYTIC SCORING OF TOEFL^® CBT ESSAYS: SCORES FROM HUMANS AND E‐RATER^®

Cited by 28 publications

References 35 publications

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

Assessing writing ability in primary education : on the evaluation of text quality and text complexity

Automatic Scoring of an Analytical Response-To-Text Assessment

Contact Info

Product

Resources

About

ANALYTIC SCORING OF TOEFL® CBT ESSAYS: SCORES FROM HUMANS AND E‐RATER®

Cited by 28 publications

References 35 publications

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

Assessing writing ability in primary education : on the evaluation of text quality and text complexity

Automatic Scoring of an Analytical Response-To-Text Assessment

Contact Info

Product

Resources

About

ANALYTIC SCORING OF TOEFL^® CBT ESSAYS: SCORES FROM HUMANS AND E‐RATER^®

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY

VALIDATION OF AUTOMATED SCORES OF TOEFL IBT^®TASKS AGAINST NONTEST INDICATORS OF WRITING ABILITY