Steven Coats scite author profile

2024

Double modals are a well-known non-standard feature of some regional varieties of English in North America, but due to their rareness in spoken language, questions remain as to the inventory of possible combinatorial types and the geographic extent of their use in contemporary naturalistic speech. This study investigates double modals in the Corpus of North American Spoken English (CoNASE), a 1.2-billion-word corpus of time-stamped and geolocated automatic speech recognition (ASR) YouTube transcripts from the United States and Canada. Double modal sequences were identified in the corpus using regular expressions, then verified via manual examination of videos. The study represents the first large-scale, continent-wide analysis of double modals based entirely on recent naturalistic production data, rather than data such as elicited responses or sentence acceptability judgments, and it demonstrates a larger double modal inventory and a broader geographic range of use for the feature than has previously been documented, including in Canada.

Grammatical feature frequencies of English on Twitter in Finland

Coats¹,

Squires²

2016

Language choice and gender in a Nordic social media corpus

2019

Nord J Linguist

This study analyzes language choice, bi- and multilingualism, and gender in a corpus of over 22 million Twitter messages by almost 36,000 authors from the Nordic countries and territories. Author location, gender, and tweet language are identified using a novel method. Three principal findings are discussed: First, gendered preference for particular languages in the Nordics can be explained in part by patterns of gendered migration. Second, a distinct geographical pattern of female/male preference for the national languages of the region and for English is evident for users who are likely native users of a Nordic language: Females are more likely to use English, while males are more likely to use a Nordic language. Third, while high rates of bi- and multilingualism are found across the whole sample, males are more likely to use more than one language in all the Nordic countries/territories. The latter two findings are interpreted in light of sociolinguistic considerations as evidence for incipient language shift towards English for Nordic users on the Twitter platform.

Articulation Rate in American English in a Corpus of YouTube Videos

2019

Lang Speech

Previous studies of the temporal organization of speech in American English have found differences in speaking or articulation rate according to speaker dialect or location, but small sample sizes and incomplete geographic coverage have limited the generalizability of the findings. In this study, articulation rates in American English are calculated from the automatic speech-to-text transcripts of more than 29,000 hours of video from local government and civic organization channels on YouTube from the 48 contiguous US states, containing more than 230 million individual word timings. Two questions are considered: are there regional differences in articulation rate? And do urban speakers articulate faster than rural speakers? The study presents several methodological innovations: first, it identifies a genre of regional speech suitable for interregional comparisons (meetings of local governments or civic organizations). Second, it introduces a new method for the calculation of articulation rate using cue and word timestamps from caption files. Third, it leverages US Census data to correlate the articulation rate with population for a large number of localities. The study shows that, in line with previous studies, Southerners articulate slower, and Americans from the Upper Midwest more quickly. In addition, there is a small but positive correlation between population size and articulation rate. Articulation rates are mapped using a measure of local autocorrelation.

Lexicon geupdated: New German anglicisms in a social media corpus

2019

The German verbal lexicon has been enriched by numerous English borrowings, particularly within the past 100 years, but while many verbal anglicisms are frequently used and sanctioned by language authorities, the status of new, non-standard, and rare verbal anglicisms in German has not been subject to extensive research attention. In this study, a new method is used to analyze non-standard German verbal anglicisms in a large and novel corpus compiled from the social media platform Twitter. After a review of previous work, the methods used to create a corpus of German-language tweets and to automatically extract new verbal anglicisms are described, and the semantics of some of their most frequent types are analyzed, including forms with separable and inseparable prefixes. Then, present and past participles are considered according to assimilation to standard German orthography, use as participle or attributive adjective, and stem vowel quality. In the final set of results, the focus is on the productivity of the verbalizing morpheme -ier-, a historically important element for the integration of foreign word material into German. The study demonstrates that non-standard verbal anglicisms are widely used, and that their morphological behavior is mediated by frequency effects as well as phonological, pragmatic, and semantic considerations.