Lexical sophistication has been an important indicator of productive lexical proficiency for almost 30 years. Although lexical sophistication has most often been operationalized as the proportion of low frequency words in a text, a growing body of research has indicated that a number of indices such as concreteness, hypernymy, and n‐gram association strengths meaningfully contribute to the construct. While the increase in available indices has expanded our understanding of the multidimensional construct, the sheer number of indices presents a practical barrier for researchers. Although some studies have begun to address this issue, most have been confined to the analysis of argumentative tasks, which are not necessarily representative of the range of tasks learners may encounter. This study therefore investigates the structure of lexical sophistication indices in a large learner corpus of English second language (L2) oral proficiency interviews (OPIs). An exploratory factor analysis identified 10 factors, 7 of which explained approximately 58% of the variance in OPI scores in a follow‐up regression analysis. The results suggest that while some features of lexical sophistication (e.g., concreteness) may be task independent, others (e.g., frequency) may be task specific.
A key piece of a validity argument for a language assessment tool is clear overlap between assessment tasks and the target language use (TLU) domain (i.e., the domain description inference). The TOEFL 2000 Spoken and Written Academic Language (T2K‐SWAL) corpus, which represents a variety of academic registers and disciplines in traditional learning environments (e.g., lectures, office hours, textbooks, course packs), has served as an important foundation for the TOEFL iBT® test's domain description inference for more than 15 years. There are, however, signs that the characteristics of the registers that students encounter may be changing. Increasingly, typical university courses include technology‐mediated learning environments (TMLEs), such as those represented by course management software and other online educational tools. To ensure that the characteristics of TOEFL iBT test tasks continue to align with the TLU domain, it is important to analyze the registers that are typically encountered in TMLEs. In this study, we address this issue by collecting a relatively large (4.5 million words) corpus of spoken and written TMLE registers across the six primary disciplines represented in T2K‐SWAL. This corpus was subsequently tagged for a wide variety of linguistic features, and a multidimensional analysis was conducted to compare and contrast written and spoken language in TMLE and T2K‐SWAL. The results indicate that although some similarities exist across spoken and written texts in traditional learning environments and TMLEs, language use also differs across learning environments (and modes) with regard to key linguistic dimensions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.