Lexical sophistication has been an important indicator of productive lexical proficiency for almost 30 years. Although lexical sophistication has most often been operationalized as the proportion of low frequency words in a text, a growing body of research has indicated that a number of indices such as concreteness, hypernymy, and n‐gram association strengths meaningfully contribute to the construct. While the increase in available indices has expanded our understanding of the multidimensional construct, the sheer number of indices presents a practical barrier for researchers. Although some studies have begun to address this issue, most have been confined to the analysis of argumentative tasks, which are not necessarily representative of the range of tasks learners may encounter. This study therefore investigates the structure of lexical sophistication indices in a large learner corpus of English second language (L2) oral proficiency interviews (OPIs). An exploratory factor analysis identified 10 factors, 7 of which explained approximately 58% of the variance in OPI scores in a follow‐up regression analysis. The results suggest that while some features of lexical sophistication (e.g., concreteness) may be task independent, others (e.g., frequency) may be task specific.
A key piece of a validity argument for a language assessment tool is clear overlap between assessment tasks and the target language use (TLU) domain (i.e., the domain description inference). The TOEFL 2000 Spoken and Written Academic Language (T2K‐SWAL) corpus, which represents a variety of academic registers and disciplines in traditional learning environments (e.g., lectures, office hours, textbooks, course packs), has served as an important foundation for the TOEFL iBT® test's domain description inference for more than 15 years. There are, however, signs that the characteristics of the registers that students encounter may be changing. Increasingly, typical university courses include technology‐mediated learning environments (TMLEs), such as those represented by course management software and other online educational tools. To ensure that the characteristics of TOEFL iBT test tasks continue to align with the TLU domain, it is important to analyze the registers that are typically encountered in TMLEs. In this study, we address this issue by collecting a relatively large (4.5 million words) corpus of spoken and written TMLE registers across the six primary disciplines represented in T2K‐SWAL. This corpus was subsequently tagged for a wide variety of linguistic features, and a multidimensional analysis was conducted to compare and contrast written and spoken language in TMLE and T2K‐SWAL. The results indicate that although some similarities exist across spoken and written texts in traditional learning environments and TMLEs, language use also differs across learning environments (and modes) with regard to key linguistic dimensions.
In this task‐repetition intervention study, L2 learners’ reuse of linguistic constructions was analyzed to investigate to what extent recurring reliance on specific constructions during the same task repetition predicts fluency development. English‐as‐a‐foreign‐language (EFL) learners performed oral narrative tasks three times per day under two task repetition schedules: blocked (Day 1: Prompt A‐A‐A, Day 2: B‐B‐B, Day 3: C‐C‐C) versus interleaved (Day 1: Prompt A‐B‐C, Day 2: A‐B‐C, Day 3: A‐B‐C). From a usage‐based perspective, their reuse of constructions across the same prompt was examined at both concrete (lexical unigram [e.g., “bicycle”] and trigram [e.g., “behind the bicycle”]) and abstract (parts of speech trigram [e.g., “preposition determiner noun”]) level. Subsequent analyses revealed that blocked practice led to higher reuse of both concrete and abstract constructions than interleaved practice. Reuse frequency was correlated with during‐training and pretest–posttest fluency changes. Particularly, greater reuse of lexical and abstract trigrams during interleaved practice led to improvements in speed and breakdown fluency (i.e., shorter mean syllable duration and fewer mid‐clause pauses) after the intervention, albeit with higher effort (indicated by longer mid‐clause and clause‐final pauses). Taken together, these findings indicate that manipulating task‐repetition schedule may systematically induce reuse of linguistic constructions, which may promote proceduralization (entrenchment) of constructional knowledge at both concrete and abstract levels.
This study examined the relationship between second language (L2) learners’ collocation knowledge and oral proficiency. A new approach to measuring collocation was adopted by eliciting responses through a word association task and using corpus-based measures (absolute frequency count, t-score, MI score) to analyze the degree to which stimulus words and responses were collocated. Oral proficiency was measured using human judgements and objective measures of fluency (articulation rate, silent pause ratio, filled pause ratio) and lexical richness (diversity, frequency, range). Forty Japanese university students completed a word association task and a spontaneous speaking task (picture narrative). Results indicated that speakers who used more low-frequency collocations in the word association task (i.e., lower collocation frequency scores) spoke faster with fewer silent pauses and were perceived to be more fluent. Speakers who provided more strongly associated collocations (as measured by MI) used more sophisticated lexical items and were perceived to be lexically proficient. Collocation knowledge remained as a unique predictor after the influence of learners’ vocabulary size (i.e., knowledge of single-word items) was considered. These findings support the key role that collocation plays in oral proficiency and provide important insights into understanding L2 speech development from the perspective of phraseological competence.
In the realm of language proficiency assessments, the domain description inference and the extrapolation inference are key components of a validity argument. Biber et al.’s description of the lexicogrammatical features of the spoken and written registers in the T2K-SWAL corpus has served as support for the TOEFL iBT test’s domain description and extrapolation inferences. In the time since the T2K-SWAL corpus was collected, however, university learning environments have increasingly become technology-mediated. Accordingly, any description of the linguistic features of university language should account for the language produced in technology-mediated learning environments (TMLEs) in addition to non-technology-mediated learning environments (non-TMLEs). Kyle et al. recently began to address this issue by collecting a corpus of TMLE language use, which they then compared to language use in non-TMLEs using multidimensional analysis (MDA). The results indicated both similarities and substantive differences across the learning environments, but the study did not investigate the effects of particular registers on these results. In this study, we build on previous research by investigating lexicogrammatical features of specific spoken and written registers across technology-mediated and non-technology-mediated learning environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.