Rie Koizumi scite author profile

Abstract-To remedy the paucity of studies on the relationship between vocabulary knowledge and speaking proficiency, we examine the degree to which second language (L2) speaking proficiency can be predicted by the size, depth, and speed of L2 vocabulary among novice to intermediate Japanese learners of English. Studies 1 and 2 administered vocabulary tests and a speaking test to 224 and 87 L2 learners, respectively. Analyses using structural equation modeling demonstrated that a substantial proportion of variance in speaking proficiency can be explained by vocabulary knowledge, size, depth, and speed. These results suggest the centrality of vocabulary knowledge to speaking proficiency.

Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens

In’nami

2012

System

A meta-analysis of test format effects on reading and listening test performance: Focus on multiple-choice and open-ended formats

In’nami

2009

Language Testing

A meta-analysis was conducted on the effects of multiple-choice and open-ended formats on L1 reading, L2 reading, and L2 listening test performance. Fifty-six data sources located in an extensive search of the literature were the basis for the estimates of the mean effect sizes of test format effects. The results using the mixed effects model of meta-analysis indicate that multiple-choice formats are easier than open-ended formats in L1 reading and L2 listening, with the degree of format effect ranging from small to large in L1 reading and medium to large in L2 listening. Overall, format effects in L2 reading are not found, although multiple-choice formats are found to be easier than open-ended formats when any one of the following four conditions is met: the studies involve between-subjects designs, random assignment, stem-equivalent items, or learners with a high L2 proficiency level. Format effects favoring multiple-choice formats across the three domains are consistently observed when studies employ between-subjects designs, random assignment, or stem-equivalent items.

Relationships between text length and lexical diversity measures: Can we use short texts of less than 100 tokens?

2012

vli

Lexical diversity (LD) measures have been known to be sensitive to the length of the text, and numerous revised LD measures have been proposed. This study aims to identify LD measures that are least affected by text length and can be used for the analysis of short L2 texts (50Á200 tokens). This study compares the type-token ratio, Guiraud index, D, and measure of textual lexical diversity (MTLD) to assess their degree of susceptibility to text length. Spoken texts of 200 tokens from 20 L2 English learners at the lower-intermediate-level were divided into segments of 50 to 200 tokens and the text length impact was examined. It was found that MTLD was least affected by text length, and that it should be used with texts of at least 100 tokens.

Validation of Empirically Derived Rating Scales for a Story Retelling Speaking Test

Hirai

Language Assessment Quarterly

2013

In recognition of the rating scale as a crucial tool of performance assessment, this study aims to establish a rating scale suitable for a Story Retelling Speaking Test (SRST), which is a semi-direct test of speaking ability in English as a foreign language (EFL) for classroom use. To identify an appropriate scale, three rating scales, all of which have been designed to have diagnostic functions, were developed for the SRST and compared in terms of their reliability, validity, and practicality. The three scales were: (a) an empirically derived, binary-choice, boundary-definition (called EBB1) scale, which has four criteria (Communicative Efficiency, Content, Grammar & Vocabulary, and Pronunciation); (b) an EBB2 scale that was modified from the EBB1 scale and has three criteria (Communicative Efficiency, Grammar & Vocabulary, and Pronunciation); and (c) a multiple-trait (MT) scale that was modified from the EBB2 but has a conventional analytic scale format. The results of the comparison revealed that the EBB2 was the most reliable and valid measure for assessing speech performance in the context of story retelling. However, the MT was shown to be the most practical, while the EBB2 permits more careful scoring, which suggests the influence of the rating scale format on test qualities. INTRODUCTIONThere is a growing awareness of teachers' responsibility to assess their students' learning and also of the impact that assessment has on learning (e.g., Hill & McNamara, 2012). Thus, this study focuses on the development of a rating scale for classroom assessment. Among a variety of factors affecting the assessment of speaking performance, such as raters, rating scales, interlocutors, elicitation tasks, and test-taker proficiency (Fulcher, 2003;Luoma, 2004), rating scales have been especially scrutinized because they "provide an operational definition of a linguistic construct" (Fulcher, 2003, p. 89) and should properly reflect a construct, or what we intend to assess (McNamara, 1996). In this regard, developing valid and reliable rating scales is of great importance in successfully assessing speaking performance.In addition, one of the greatest challenges in performance assessment is practicality.Rating procedures often take a large amount of time by requiring teachers to listen to student performances individually. Moreover, the use of commercially available speaking tests imposes a financial burden on the students. For that reason, classroom teachers are reluctant to use such tests to assess classes of about 40 students (e.g., Honda, 2007). In this regard, time-and cost-effectiveness are particularly important for tools used in practical classroom assessment.The speaking test for which the scale is being created is the Story Retelling Speaking Test (SRST), a user-friendly, semi-direct speaking test that uses an integrated reading-to-retell task developed for classroom use by the authors (see the "Procedure of the SRST" section and Appendix A; Hirai & Koizumi, 2009). On the basis of the results of the questionnaire us...