Task difficulty is an important but complex phenomenon in Applied Linguistics, for which there is relatively little empirical research. This article discusses approaches to defining task difficulty and focuses on objective task difficulty derived from ratings of performances and on difficulty derived from an error count in the performances, thus taking errors as indicators of writing task difficulty. Errors are described in terms of the Scope-Substance error taxonomy in writing performances from the Slovene General Matura examination in English. The most frequent errors are located at word and phrase level. Generally, error frequency decreases as writing proficiency increases, but some error types do not conform to this trend. This is the case for punctuation errors, which gain prominence at higher levels of mastery. The results of this study are relevant for assessment, particularly for rating scale development or revision, and for rater training. They are equally relevant for teaching, since knowing sources of difficulty in tasks is a prerequisite for effective pedagogical action. More generally, if applied to performances based on a wider range of tasks, viewing errors as indicators of difficulty can lead to a better understanding of difficulty-generating task features.
This study investigates errors in a sample of 50 written performances of Austrian learners of English collected in the 2009 baseline study for the Austrian Educational Standards-Based Writing Test for English at grade 8 (E8 Standards Writing Test). The research aims to contribute to the validation of this large-scale assessment by studying the relationship between errors (described using the Scope -Substance error taxonomy) and human ratings awarded to writing performances. The results add to the validity evidence of the E8 Standards WritingTest. There is a negative relationship between human ratings and the presence of errors; a low error density is associated with higher ratings and a high error density with lower ratings. Substance wo, cls, and x error densities play an important role in the rating in most dimensions; errors with a larger scope also have a strong effect. By highlighting aspects of errors to which raters seem to be sensitive, these findings constitute evidence of context validity. At the same time, the findings are relevant to theory-based validity by concretising areas of competence that learners need to develop in order to receive higher ratings. While errors are important determinants of the ratings, additional factors, presumably positive features, must be at play as the accuracy of the regression models is low to moderate. This should in fact be the case since the E8 rating scale refers to negative as well as positive features.
The present study describes a first step towards validating the rating scale for assessing L1 German writing in the context of the Austrian Matura exam. After describing the process of scale development in the context of the exam reform, it reports on an empirical study into the stability of scale descriptors. The 70 scale descriptors were assessed in terms of their difficulty by a panel of 100 experienced teachers who had not undergone training in the use of the scale. This data served as the basis for studying overall rater agreement, the correspondence of the sequence of empirically scaled descriptors to the intended sequence, and for studying rater agreement on individual descriptors. It was found that using the scale without previous rater training is not recommendable and rater training is indispensable. The highest level on the scale was found to be the most consensual among the assessors. There is relatively high agreement with regard to what constitutes excellence in L1 German writing. The descriptors on the critical pass level were found to function relatively well although at least two descriptors turned out to be unstable and should be focused on in rater training. Overall, a high number of stable
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.