The idea behind this chapter is that it is useful to use frequency and range of occurrence to distinguish several levels of vocabulary. Distinguishing these levels helps ensure that learners learn vocabulary in the most useful sequence and thus gain the most benefit from the vocabulary they learn. Making the high-frequency/mid-frequency/ low-frequency distinctions ensures that the teacher deals with vocabulary in the most efficient ways, and that learners get the most benefit from the vocabulary that they learn. What are the different ways of counting words?There are several ways of counting words, that is, deciding what will be counted. The most important distinction is between counting running words (tokens) and different words (types or families). TokensCounting tokens involves counting every word form in a spoken or written text and if the same word form occurs more than once, then each occurrence of it is counted. So, the sentence 'It is not easy to say it correctly' would contain eight words, even though two of them are the same word form, it. Words which are counted in this way are called tokens, and sometimes running words. If we try to answer questions like 'How many words are there on a page or in a line?', 'How long is this book?', 'How fast can you read?', 'How many words does the average person speak per minute?', then our unit of counting will be the token. TypesWe can count the words in the sentence 'It is not easy to say it correctly' another way. When we see the same word occur again, we do
Corpus linguistics is leading to the development of theories about language which challenge existing orthodoxies in applied linguistics. However, there are also many questions which should be examined and debated: how big should a corpus be? Is the data from a corpus reliable? What are its applications for language teaching? Corpora in Applied Linguistics exams these and other questions related to this emerging field. It discusses these important issues and explores the techniques of investigating a corpus, as well as demonstrating the application of corpora in a wide variety of fields. It also outlines the impact corpus linguistics is having on how languages are taught in the classroom and how it is informing language teaching materials and dictionaries. It makes a superb and accessible introduction to corpus linguistics and is a must read for anyone interested in corpus linguistics and its impact on applied linguistics.
This paper considers the contentious term ‘semantic prosody’ and discusses a number of aspects of the concept described by the term. It is pointed out that although many writers use it to refer to the implied attitudinal meaning of a word, Sinclair uses the term to refer to the discourse function of a unit of meaning. Problems of apparent counter-examples, when a word or unit does not have the semantic prosody that is typical of it, are discussed. The second point made is that the phenomena described as ‘semantic prosody’ can be regarded as observational data, but that they are often used to explain subjective reactions to a given text or to predict such reactions. The issues raised by these different uses are discussed. Finally, the pitfalls of using concordance lines to observe attitudinal language in highly opinionated texts are discussed.
No abstract
This paper introduces topic modelling, a machine learning technique that automatically identifies ‘topics’ in a given corpus. The paper illustrates its use in the exploration of a corpus of academic English. It first offers the intuitive explanation of the underlying mechanism of topic modelling and describes the procedure for building a model, including the decisions involved in the model-building process. The paper then explores the model. A topic in topic models is characterised by a set of co-occurring words, and we will demonstrate that such topics bring us rich insights into the nature of a corpus. As exemplary tasks, this paper identifies the prominent topics in different parts of papers, investigates the chronological change of a journal, and reveals different types of papers in the journal. The paper further compares topic modelling to two more traditional techniques in corpus linguistics, semantic annotation and keywords analysis, and highlights the strengths of topic modelling. We believe that topic modelling is particularly useful in the initial exploration of a corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.