Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems—developed by Amazon, Apple, Google, IBM, and Microsoft—to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies—such as using more diverse training datasets that include African American Vernacular English—to reduce these performance differences and ensure speech recognition technology is inclusive.
We explore two unresolved methodological issues in the study of copula variation in African-American Vernacular English, assessing their quantitative and theoretical consequences via multiple variable rule analyses of data from East Palo Alto, California. The first is whether is-contraction and deletion should be considered separately from that of are. We conclude that it should not, because the quantitative conditioning is almost identical for the two forms, and a combined analysis offers analytical advantages. The second issue is whether the alternative methods that previous researchers have used to compute the incidence of "contraction" or "deletion" ("Labov Contraction and Deletion," "Straight Contraction and Deletion," "Romaine Contraction") fundamentally affect the results. We conclude that they do, especially for contraction. We also discuss implications of our analysis for two related issues: the ordering of contraction and deletion in the grammar, and the presence of age-grading or change in progress in East Palo Alto.In this article, we reopen the analysis of one of the oldest and most frequently examined variables in the paradigm of quantitative sociolinguistics: variation between full, contracted, and
This article presents a synchronic and diachronic investigation of the lexeme all in its intensifier and quotative functions. We delimit the new from the old functions of the lexeme and present a variationist account of all's external and internal constraints in various syntactic environments. our analysis is based on a variety of data sets, which include traditional sociolinguistic interviews as well as data culled from internet searches and a new Google-based search tool. on the basis of these data sets, we show that intensifier all is not new but has expanded in syntactic environments. We further pinpoint the syntactic and semantic niches which all has appropriated for itself among California adolescents and compare its patterning with that of other intensifiers in our data and the data of other researchers. All's extension to quotative function, however, is new, apparently originating in California in the 1980s. our investigation of its development spans across data sets from 15 years. using variable rule analysis and other quantitative techniques, we examine the distribution of quotative all vis-à-vis its competitor variants (including be like, say, and go) and show that the constraints on quotative all have undergone a marked shift in recent years and that quotative all is in decline right now, after peaking in the 1990s.
presente. Este caso destaca la importancia de prestando m as atenci on a la variaci on estil ıstica y incluyendo m as de dos puntos de tiempo en los estudios socioling€ u ısticos del cambio en tiempo real y aparente. [Spanish]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.