In this paper, we present an approach to the automatic identification and correction of preposition and determiner errors in nonnative (L2) English writing. We show that models of use for these parts of speech can be learned with an accuracy of 70.06% and 92.15% respectively on L1 text, and present first results in an error detection task for L2 writing.
This paper proposes a machine-learning based approach to predict accurately, given a syntactic and semantic context, which preposition is most likely to occur in that context. Each occurrence of a preposition in an English corpus has its context represented by a vector containing 307 features. The vectors are processed by a voted perceptron algorithm to learn associations between contexts and prepositions. In preliminary tests, we can associate contexts and prepositions with a success rate of up to 84.5%.
In this article, we present an approach to the automatic correction of preposition errors in L2 English. Our system, based on a maximum entropy classifier, achieves average precision of 42% and recall of 35% on this task. The discussion of results obtained on correct and incorrect data aims to establish what characteristics of L2 writing prove particularly problematic in this task.
Since its 1947 founding, ETS has conducted and disseminated scientific research to support its products and services, and to advance the measurement and education fields. In keeping with these goals, ETS is committed to making its research freely available to the professional community and to the general public. Published accounts of ETS research, including papers in the ETS Research Report series, undergo a formal peer-review process by ETS staff to ensure that they meet established scientific and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that outside organizations may provide as part of their own publication processes. As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS's constituents and the field.To obtain a PDF or a print copy of a report, please visit: AbstractThis study proposes an approach to automatically score the TOEIC ® Writing e-mail task. We focus on one component of the scoring rubric, which notes whether the test-takers have used particular speech acts such as requests, orders, or commitments. We developed a computational model for automated speech act identification and tested it on a corpus of TOEIC responses, achieving up to 79.28% accuracy. This model represents a positive first step toward the development of a more comprehensive scoring model. We also created a corpus of speech actannotated native English workplace e-mails. Comparisons between these and the TOEIC data allow us to assess whether English learners are approximating native models and whether differences between native and non-native data can have negative consequences in the global workplace.
This paper looks at the use and non-use of please in American and British English requests. The analysis is based on request data from two comparable workplace email corpora, which have been pragmatically annotated to enable retrieval of all request speech acts regardless of formulation. 675 requests are extracted from each of the two corpora; the behaviour of please is analyzed with regard to factors such as imposition level, sentence mood, and modal verb type. Differences in use of please between the two varieties of English can be accounted for by viewing this as a marker of conventional politeness rather than face-threat mitigation in British English and as a marker of relationship asymmetry in American English.
This article introduces the Clinton Email Corpus, comprising 33,000 recently released email messages sent to and from Hillary Clinton during her tenure as United States Secretary of State, and presents the results of a first investigation into the effect of status and gender on politeness-related linguistic choices within the corpus, based on a sample of 500 emails. We describe the composition of the corpus and mention the technical challenges inherent in its creation, and then present the 500-email subset, in which all messages are categorized according to sender and recipient gender, position in the workplace hierarchy, and personal closeness to Clinton. The analysis looks at the most frequent bigrams in each of these subsets as a starting point for the identification of linguistic differences. We find that the main differences relate to the content and function of the messages rather than their tone. Individuals lower in the hierarchy but not in Clinton's inner circle are more often engaged in practical tasks, while members of the inner circle primarily discuss issues and use email to arrange in-person conversations. Clinton herself is generally found to engage neither in extensive politeness nor in overt displays of power. These findings present further evidence of how corpus linguistics can be used to advance our understanding of workplace pragmatics.
book reviews Flowerdew, L. 2012. Corpora and Language Education. Basingstoke: Palgrave Macmillan. (xv + 347 pp.) Despite the focus on language education in the title, Flowerdew's volume provides an excellent overview of the many faces of corpus linguistics (CL) for any interested researcher or student. The volume is part of the textbook series Research and Practice in Applied Linguistics, which is aimed at "students and researchers in Applied Linguistics, TESOL, Language Education and related areas, and language professionals keen to extend their research experience" (p. xiv); it assumes some knowledge of linguistics on the part of its readers. By presenting chapters which interweave theoretical issues stemming from years of research and incisive accounts of particular case studies and research projects, this volume certainly achieves the goal of showing the reader how CL research and practice interact and how each contributes to the growth and development of the other. Given the pedagogical nature of the book, its evaluation will be focused on its merits as a textbook, though I have not yet had the opportunity to use it as such with my students.The book is in four parts: Key concepts and approaches; The nexus of corpus linguistics, textlinguistics and sociolinguistics; Applications of corpora in research and teaching arenas; Resources. Each chapter is characterized by a number of conventions shared with other texts in the series, including: a clear statement of the aims of the chapter in bullet point form; concepts, quotes and examples "boxed off " from the text for emphasis; and brief annotated suggestions for further reading. These boxed off sections are to my mind one of the most salient and interesting features of the textbook, so it is worth considering their function briefly before moving on to the discussion of the wider contents.The most prominent among these boxed off sections are the 'Concepts'. These are extracted and adapted from the literature or authored by Flowerdew herself. As the name suggests, they refer to basic aspects of CL and related domains, which the reader ought to be acquainted with in order to better understand the topics discussed in a given section. Examples include "Criteria for defining a corpus" (Concept 1.1), phraseology, the distinction between competence and performance, the probabilistic vs. the neo-Firthian approach, frame semantics, Hyland's interactional level of metadiscourse, dialect, world Englishes, corpus stylistics, vague language in medical interaction, and "Sketch Engine search facilities". As we can see, the topics range widely, from central, general concepts to rather specific -sometimes author-specific -items. I agree with the need to ensure that the key concepts
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.