The current study presents a New General Service List (new-GSL), which is a result of robust comparison of four language corpora (LOB, BNC, BE06, and EnTenTen12) of the total size of over 12 billion running words. The four corpora were selected to represent a variety of corpus sizes and approaches to representativeness and sampling. In particular, the study investigates the lexical overlap among the corpora in the top 3,000 words based on the average reduced frequency (ARF), which is a measure that takes into consideration both frequency and dispersion of lexical items. The results show that there exists a stable vocabulary core of 2,122 items (70.7%) among the four corpora. Moreover, these vocabulary items occur with comparable ranks in the individual wordlists. In producing the new-GSL, the core vocabulary items were combined with new items frequently occurring in the corpora representing current language use (BE06 and EnTenTen12). The final product of the study, the new-GSL, consists of 2,494 lemmas and covers between 80.1 and 81.7 per cent of the text in the source corpora.
This article focuses on the use of collocations in language learning research (LLR). Collocations, as units of formulaic language, are becoming prominent in our understanding of language learning and use; however, while the number of corpus‐based LLR studies of collocations is growing, there is still a need for a deeper understanding of factors that play a role in establishing that two words in a corpus can be considered to be collocates. In this article we critically review both the application of measures used to identify collocability between words and the nature of the relationship between two collocates. Particular attention is paid to the comparison of collocability across different corpora representing different genres, registers, or modalities. Several issues involved in the interpretation of collocational patterns in the production of first language and second language users are also considered. Reflecting on the current practices in the field, further directions for collocation research are proposed.
This article contributes to the debate about the appropriate use of corpus data in language learning research. It focuses on frequencies of linguistic features in language use and their comparison across corpora. The majority of corpus-based second language acquisition studies employ a comparative design in which either one or more second language (L2) corpora are compared to a first language (L1) production corpus or two or more L2 corpora are compared to each other. This article critically examines some of the central tenets of the comparative method related to the interspeaker variation in L1 and L2 use, the representativeness and comparability of corpus data, the interpretation of difference found between corpora and the appropriate use of statistics. Using and discussing a set of five L1 spoken English corpora and three L2 English corpora (two spoken and one written), we approach these areas empirically exploring different sources of variations and methodological options that corpus-based SLA studies offer.
a b s t r a c tThis paper investigates the quality of knowledge of technical words that high-school students learned from subject reading. In particular, it focuses on similarities and differences between students who learned new words through their L1 and their L2. In the study, 72 students were divided into two groups and asked to read and listen to two expository texts. One group received the texts in their L1 (Slovak) and the other group in their L2 (English). Afterwards the participants were tested on their knowledge of twelve technical words that appeared in the texts. The responses were examined in terms of the completeness of word meaning and the presence of errors. The results showed that compared to the L1-instructed students, the L2-instructed participants provided word meanings that were less complete and less precise. Word meanings from both groups contained errors involving omission of correct meaning components and inclusion of incorrect meaning components. L2-instructed participants made more errors of both kinds. The differences between the two groups are discussed with respect to vocabulary acquisition and subject learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.