Providers of tests of languages for academic purposes generally claim to provide evidence on the extent to which students are likely to be able to cope with the future demands of reading in specified real-life contexts. Such claims need to be supported by evidence that the texts employed in the test reflect salient features of the texts the test takers will encounter in the target situation as well as demonstrating the comparability of the cognitive processing demands of accompanying test tasks with target reading activities. This paper will focus on the issue of text comparability. For reasons of practicality, evidence relating to text characteristics is generally based on the expert judgement of individual test writers, arrived at through a holistic interpretation of test specifications. However, advances in automated textual analysis and a better understanding of the value of pooled qualitative judgement have now made it feasible to provide more quantitative approaches focusing analytically on a wide range of individual characteristics. This paper will employ these techniques to explore the comparability of texts used in a test of academic reading comprehension and key texts used by first-year undergraduates at a British university. It offers a principled means for test providers and test users to evaluate this aspect of test validity.
CEPA® Written Communication Assessment™ is a computer-based, criterion-referenced, and proficiency-oriented test designed to measure non-native speakers' workplace English proficiency in writing. In its creation, developers used the evidence-centered design (ECD) approach (Mislevy & Haertel, 2006) to define and operationalize the construct of writing in the context of modern workplace communication. The first phase of ECD, Domain Analysis, yielded a comprehensive definition of 21st-century workplace written communication, revisions to CEFR language-proficiency descriptors, and an adaptation of the GRASPS framework (Wiggins & McTighe, 2008). These discoveries served the Domain Modeling phase by defining proficiency-oriented performance task design patterns; after which, a Conceptual Assessment Framework could be constructed. Due to the essential expansive detailing of each phase, this paper focuses only on the first three 'assessment conceptualization' phases of ECD. That is, the discussion of the last two 'assessment operationalization' phases, the Assessment Implementation and Delivery, is beyond the scope of the current paper. In keeping with the notion that assessment as argument is a cornerstone of test validation (Kane, 2006;, this study reports on the use of ECD's first three phases, conceiving of assessment practices as evidentiary arguments, to inform the design and development of the CEPA Written Communication Assessment.
Content and Language integrated learning (CLIL) revolves around the dual goal of language acquisition and content knowledge; therefore, cross-curricular collaboration between language and content teachers is one of the key factors for the success of CLIL education. This study investigates multiple aspects of cross-curricular collaboration in a Vietnamese CLIL program, including teachers' beliefs about pedagogic roles, professional support provided, and actual cross-curricular collaboration implemented. Data collected from eight teachers through semi-structured interviews were coded for emerging themes using thematic analysis, and relevant documents were analysed as complementary data. The findings indicate that the teachers viewed their pedagogical responsibilities and foci rigidly within their discipline, rather than as a dual-focused role of both language and content teaching. Additionally, a mismatch between professional support provided by the school and by the program designers was identified, indicating insufficient training and supervision in the implementation of the program. Although there was evidence of teacher collaboration, the practice still lacked consistency and systematicity due to issues such as workload, schedule and motivation. The findings from this study have important implications for professional development and curriculum design in CLIL bilingual programs to facilitate successful cross-curricular collaboration.
Comprehending a text involves constructing a coherent mental representation of it and deep comprehension of a text in its entirety is a critical skill in academic contexts. Interpretations on test takers' ability to comprehend texts are made on the basis of performance in test tasks but the extent to which test tasks are effective in directing test takers towards reading a text to understand the whole of it is questionable. In the current study, tests based on multiple choice items are investigated in terms of their potential to facilitate or preclude cognitive processes that lead to higher level reading processes necessary for text level macrostructure formation. Participants' performance in macrostructure formation after completing a multiple choice test and a summarization task were quantitatively and qualitatively analyzed. Task performances were compared and retrospective verbal protocol data were analyzed to categorize the reading processes the participants went through while dealing with both tasks. Analyses showed that participants' performance in macrostructure formation of the texts they read for multiple choice test completion and summarization task differed significantly and that they were less successful in comprehending the text in its entirety when they were asked to read to answer multiple choice questions that followed the text. The findings provided substantial evidence of the inefficacy of the multiple choice test technique in facilitating test takers' macrostructure formation and thus pointed at yet another threat to the validity of this test technique.
In practices of direct assessment of writing ability, the variability of human decision-making during scoring poses great challenges to the validity of assessment (Kane, 2006). The variables causing differences in individual raters' scoring interpretations have been widely investigated (e.g. Eckes, 2012; Wolfe et.al, 2016). However, the issue of how raters negotiate to resolve discrepancies has not received attention although rater negotiation is a widely used score resolution method. As it has been emphasized by scholars interested in the argumentative behavior of raters (e.g. Trace et. al., 2017), a systematic analysis of score negotiations will enable us to analyze the dependability of score negotiations. The purpose of this study is twofold: to present a thorough analysis of the argumentative structure of rater discrepancy resolution discussions with a view to understanding their underlying dynamics, and to investigate whether the elements of the argumentative structure of negotiations differ from research settings to authentic score resolution practices. In line with this aim, rater negotiations following a written test at the language school of an English-medium university were analyzed within the framework of Argumentation Theory by Toulmin (1958) and Walton (2005Walton ( , 2016. The negotiation data were obtained from 99 recorded rater discussions among 30 EFL teachers, and transcribed, coded and categorized into argumentative discussion moves. A Rater Negotiation Scheme (RNS) was developed through a recursive data analysis and categorization process, and it was validated through field-testing in authentic settings. The findings have implications both for research on rater negotiations and arguments on the reliability of the method.
This study has investigated the differences in cognitive processes that test-takers undergo while answering reading comprehension questions in multiple-choice and open-ended short answer formats. For this purpose, data were collected from a group of undergraduate students in an English medium university through eye-tracking technology, immediate retrospective verbal protocols, and short semi-structured interviews. The results showed that the participants used careful reading skills more and comprehended the text more thoroughly in the open-ended format. However, in the MC format, they read less carefully and used more test-taking strategies. These findings contribute to the ongoing discussion on how item format can alter the cognitive processes in a reading comprehension test and confirm the effectiveness of eye-tracking in unveiling cognitive processes in combination with qualitative methods. This study has implications for reading test development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.