The integrated assessment of language skills, particularly reading-into-writing, is experiencing a renaissance. The use of rating rubrics, with verbal descriptors that describe quality of L2 writing performance, in large scale assessment is well-established. However, less attention has been directed towards the development of reading-into-writing rubrics. The task of identifying and evaluating the contribution of reading ability to the writing process and product so that it can be reflected in a set of rating criteria is not straightforward. This paper reports on a recent project to define the construct of reading-into-writing ability for designing a suite of integrated tasks at four proficiency levels, ranging from CEFR A2 to C1. The authors discuss how the processes of theoretical construct definition, together with empirical analyses of test taker performance, were used to underpin the development of rating rubrics for the reading-into-writing tests. Methodologies utilised in the project included questionnaire, expert panel judgement, group interview, automated textual analysis and analysis of rater reliability. Based on the results of three pilot studies, the effectiveness of the rating scales is discussed. The findings can inform decisions about how best to account for both the reading and writing dimensions of test taker performance in the rubrics descriptors . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 We have removed the yellow highlights and double spaced the paper. Detailed Response to ReviewersDeveloping rubrics to assess the reading-into-writing skills: a case study Chihiro Inoue is a Lecturer in Language Assessment at CRELLA, University of Bedfordshire. Her main research interests lie in the test task design, rating scale development and the criterial features of learner language. She has worked on a number of test development and validation projects in English and Japanese languages in the UK, USA and Japan.Lynda Taylor is a Senior Lecturer at CRELLA, University of Bedfordshire. She holds an MPhil and PhD in language testing and assessment, both from the University of Cambridge, UK. Over 30 years she has accumulated extensive knowledge and experience of the theoretical and practical issues in language teaching, learning and assessment. *Brief author biographyHighlights The project developed and validated a suite of reading-into-writing rubrics. The project adopted an empirical mixed-method approach to developing rubrics. The project provided validity evidence that the rubrics have met key quality standards.*Highlights (for review) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1...
The constructs of complexity, accuracy and fluency (CAF) have been used extensively to investigate learner performance on second language tasks. However, a serious concern is that the variables used to measure these constructs are sometimes used conventionally without any empirical justification. It is crucial for researchers to 15 understand how results might be different depending on which measurements are used, and accordingly, choose the most appropriate variables for their research aims. The first strand of this article examines the variables conventionally used to measure syntactic complexity in order to identify which may be the best indicators of different proficiency levels, following suggestions by Norris and Ortega. The second strand compares the three variables used to measure accuracy in order to identify which one 20 is most valid. The data analysed were spoken performances by 64 Japanese EFL students on two picture-based narrative tasks, which were rated at Common European Framework of Reference for Languages (CEFR) A2 to B2 according to Raschadjusted ratings by seven human judges. The tasks performed were very similar, but had different degrees of what Loschky and Bley-Vroman term 'task-essentialness' for subordinate clauses. It was found that the variables used to measure syntactic complexity yielded results that were not consistent with suggestions by Norris and 25 Ortega. The variable found to be the most valid for measuring accuracy was errors per 100 words. Analysis of transcripts revealed that results were strongly influenced by the differing degrees of task-essentialness for subordination between the two tasks, as well as the spread of errors across different units of analysis. This implies that the characteristics of test tasks need to be carefully scrutinised, followed by careful piloting, in order to ensure greater validity and reliability in task-based research.
This research explores how internet-based video-conferencing technology can be used to deliver and conduct a speaking test, and what similarities and differences can be discerned between the standard and computer-mediated face-to-face modes. The context of the study is a high-stakes speaking test, and the motivation for the research is the need for test providers to keep under constant review the extent to which their tests are accessible and fair to a wide constituency of test takers. The study examines test-takers' scores and linguistic output, and examiners' test administration and rating behaviors across the two modes. A convergent parallel mixed-methods research design was used, analyzing test-takers' scores and language functions elicited, examiners' written comments, feedback questionnaires and verbal reports, as well as observation notes taken by researchers. While the two delivery modes generated similar test score outcomes, some differences were observed in test-takers' functional output and the behavior of examiners who served as both raters and interlocutors.
This study investigated the effects of two different planning time conditions (i.e., operational [20 s] and extended length [90 s]) for the lecture listening‐into‐speaking tasks of the TOEFL iBT® test for candidates at different proficiency levels. Seventy international students based in universities and language schools in the United Kingdom (35 at a lower level; 35 at a higher level) participated in the study. The effects of different lengths of planning time were examined in terms of (a) the scores given by ETS‐certified raters; (b) the quality of the speaking performances characterized by accurately reproduced idea units and the measures of complexity, accuracy, and fluency; and (c) self‐reported use of cognitive and metacognitive processes and strategies during listening, planning, and speaking. The results found neither a statistically significant main effect of the length of planning time nor an interaction between planning time and proficiency on the scores or on the quality of the speaking performance. There were several cognitive and metacognitive processes and strategies where significantly more engagement was reported under the extended planning time, which suggests enhanced cognitive validity of the task. However, the increased engagement in planning did not lead to any measurable improvement in the score. Therefore, in the interest of practicality, the results of this study provide justifications for the operational length of planning time for the lecture listening‐into‐speaking tasks in the speaking section of the TOEFL iBT test.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.