Over the past decade, listening comprehension tests have been converting to computer-based tests that include visual input. However, little research is available to suggest how test takers engage with different types of visuals on such tests. The present study compared a series of still images to video in academic computer-based tests to determine how test takers engage with these two test modes. The study, which employed observations, retrospective reports and interviews, used data from university-level non-native speakers of English. The findings suggest that test takers engage differently with these two modes of delivery. Specifically, while test takers engaged minimally and similarly with the still images, there was wide variation in the ways and degree to which they engaged with the video stimulus. Implications of the study are that computer-based tests of listening comprehension could include still images with only minimally altering the construct that is measured by audio-only listening tests, but the utilization of video in such computer-based tests may require a rethinking of the listening construct.
FACETS many-facet Rasch analysis software (Linacre, 1998a) was utilized to look at two consecutive administrations of a large-scale (more than 1000 examinees) second language oral assessment in the form of a peer group discussion task with Japanese English-major university students. Facets modeled in the analysis were examinee, prompt, rater, and ve rating category 'items.' Unidimensionality was shown to be strong in both datasets, and approaches to interpreting t values for the facets modeled in the analysis were discussed. Examinee ability was the most substantial facet, followed by rater severity, and item. The prompt facet was negligible in magnitude. Rater differences in terms of severity were generally large, but this characteristic was not stable over time for individuals; returning raters tended to move toward greater severity and consistency, while new raters showed much more inconsistency. Analysis of the scales showed general validity in gradations of scale steps, though raters had some dif culty discerning between categories at the ends of the scales for pronunciation and communicative skills.
Concerns about the need for assessing multidialectal listening skills for global contexts are becoming increasingly prevalent. However, the inclusion of multiple accents on listening assessments may threaten test fairness because it is not practical to include every accent that may be encountered in the language use domain on these tests. Given this dilemma, this study aimed to determine the extent to which accent strength and familiarity affect comprehension and to provide a defensible direction for assessing multidialectal listening comprehension. A strength of accent scale was developed, and one US, four Australian, and four British English speakers of English were selected based on a judgment of their strength of accent. Next, TOEFL test takers (N = 21,726) were randomly assigned to listen to a common lecture given by one of the nine selected speakers, and respond to six comprehension items and a survey designed to assess their familiarity with various accents. The results suggest that strength of accent and familiarity do affect listening comprehension, and these factors affect comprehension even with quite light accents. AbstractConcerns about the need for assessing multidialectal listening skills for global contexts are
Computer‐based testing (CBT) to assess second language ability has undergone remarkable development since Garret (1991) described its purpose as “the computerized administration of conventional tests” in The Modern Language Journal. For instance, CBT has made possible the delivery of more authentic tests than traditional paper‐and‐pencil tests. CBT has also made it possible to more reliably, practically, and almost instantaneously score essays, oral speech samples, and other types of test responses. Unfortunately, however, due to a number of unresolved problems, CBT has failed to realize its anticipated potential. CBT has limited usability because systems that ensure test and score security have yet to be developed. Computer‐adaptive testing, one of the most promising areas of CBT has not met expectations because of failure to solve problems with the statistical techniques on which it is based and the lack of resources necessary to implement it in most assessment contexts. In spite of these and other limitations, given the growing capability of CBT to deliver more authentic tests than paper‐and‐pencil, its use for assessing second language ability will undoubtedly continue to expand.
The second language group oral is a test of second language speaking proficiency, in which a group of three or more English language learners discuss an assigned topic without interaction with interlocutors. Concerns expressed about the extent to which test takers' personal characteristics affect the scores of others in the group have limited its attractiveness. This study investigates the degree to which assertive and non-assertive test takers' scores are affected by the levels of assertiveness of their group members. The sample of test takers was Japanese first year university students who were studying English in Japan. The students took the revised NEO-PI-R (Costa & McCrae, 1992; Shimanoka et al., 2002), a group oral test, and PhonePass SET-10 (Ordinate, 2004). Two separate MANCOVA analyses were conducted, one designed to determine the extent to which assertive test takers' scores are affected by the levels of assertiveness of group members (N = 112), and one designed to determine the extent to which non-assertive test takers' scores are affected by the levels of assertiveness of group members (N = 113). The analyses indicated that assertive test takers were assigned higher scores than expected when grouped with only non-assertive test takers and lower scores than expected when grouped with only assertive test takers, while the study failed to find an effect for grouping based on assertiveness for non-assertive test takers' scores. The findings of the study suggest that when the group oral is used, rater-training sessions should include guidance on how to evaluate a test taker in the context of the group in which the test taker is assessed and assign scores that are not based on a comparison of proficiencies of group members.
The assessment of oral communication has continued to evolve over the past few decades. The construct being assessed has broadened to include interactional competence, and technology has played a role in the types of tasks that are currently popular. In this paper, we discuss the factors that affect the process of oral communication assessment, current conceptualizations of the construct to be assessed, and five tasks that are used to assess this construct. These tasks include oral proficiency interviews, paired/group oral discussion tasks, simulated tasks, integrated oral communication tasks, and elicited imitation tasks. We evaluate these tasks based on current conceptualizations of the construct of oral communication, and conclude that they do not assess a broad construct of oral communication equally. Based on our evaluation, we advise test developers to consider the aspects of oral communication that they aim to include or exclude in their assessment when they select one of these task types. ABSTRACTThe assessment of oral communication has continued to evolve over the past few decades. The construct being assessed has broadened to include interactional competence, and technology has played a role in the types of tasks that are currently popular. In this paper, we discuss the factors that affect the process of oral communication assessment, current conceptualizations of the construct to be assessed, and five tasks that are used to assess this construct. These tasks include oral proficiency interviews, paired/group oral discussion tasks, simulated tasks, integrated oral communication tasks, and elicited imitation tasks. We evaluate these tasks based on current conceptualizations of the construct of oral communication, and conclude that they do not assess a broad construct of oral communication equally. Based on our evaluation, we advise test developers to consider the aspects of oral communication that they aim to include or exclude in their assessment when they select one of these task types.
Studies that use structural equation modeling (SEM) techniques are increasingly encountered in the language assessment literature. This popularity has created the need for a set of guidelines that can indicate what should be included in a research report and make it possible for research consumers to judge the appropriateness of the interpretations made from a reported study. This article attempts to fill this void by providing a set of reporting guidelines appropriate for language assessment researchers.
The purpose of this study was to determine the extent to which performance on the TOEFL iBT speaking section is associated with other indicators of Japanese university students’ abilities to communicate orally in an academic English environment and to determine which components of oral ability for these tasks are best assessed by TOEFL iBT. To achieve this aim, TOEFL iBT speaking scores were compared to performances on a group oral discussion, picture and graph description, and prepared oral presentation tasks, and their component scores of pronunciation, fluency, grammar/vocabulary, interactional competence, descriptive skill, delivery skill, and question answering. Participants were Japanese university students (N = 222), who were English majors in a Japanese university. Pearson product–moment correlations, corrected for attenuation, between scores on the speaking section of TOEFL iBT and the three university tasks indicated strong relationships between the TOEFL iBT speaking scores and the three university tasks and high or moderate correlations between the TOEFL iBT speaking scores and the components of oral ability. For the components of oral ability, pronunciation, fluency, and vocabulary/grammar were highly associated with TOEFL iBT speaking scores while interactional competence, descriptive skill, and delivery skill were moderately associated with TOEFL iBT speaking scores. The findings suggest that TOEFL iBT speaking scores are good overall indicators of academic oral ability and that they are better measures of pronunciation, fluency and vocabulary/grammar than they are of interactional competence, descriptive skill, and presentation delivery skill.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.