Possible integrated and independent tasks were pilot tested for the writing section of a new generation of TOEFL® (Test of English as a Foreign Language™) examination. This study examines the impact of various rating designs as well as the impact of the number of tasks and raters on the reliability of writing scores based on integrated and independent tasks from the perspective of generalizability theory (G‐theory). Both univariate and multivariate G‐theory analyses were conducted. It was found that (a) in terms of maximizing the score reliability, it would be more efficient to increase the number of tasks rather than the number of ratings per essay; (b) two particular single‐rating designs having different tasks for the same examinee rated by different raters [p × (R:T), R:(p × T)] achieved relatively higher score reliabilities than other single‐rating designs; and (c) a somewhat larger gain in composite score reliability was achieved when the number of listening‐writing tasks was larger than the number of reading‐writing tasks.
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and e‐rater® essay feature variables in the context of the TOEFL® computer‐based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic essay scores provided by human raters and essay feature variable scores computed by e‐rater (version 2.0) for two TOEFL CBT writing prompts. It was found that (a) all of the six analytic scores were not only correlated among themselves but also correlated with the holistic scores, (b) high correlations obtained among holistic and analytic scores were largely attributable to the impact of essay length on both analytic and holistic scoring, (c) there may be some potential for profile scoring based on analytic scores, and (d) some strong associations were confirmed between several e‐rater variables and analytic ratings. Implications are discussed for improving the analytic scoring of essays, validating automated scores, and refining e‐rater essay feature variables.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.