In this paper, we present a novel computational approach to the analysis of accent variation. The case study is dialect leveling in the North of England, manifested as reduction of accent variation across the North and emergence of General Northern English (GNE), a pan-regional standard accent associated with middle-class speakers. We investigated this instance of dialect leveling using random forest classification, with audio data from a crowd-sourced corpus of 105 urban, mostly highly-educated speakers from five northern UK cities: Leeds, Liverpool, Manchester, Newcastle upon Tyne, and Sheffield. We trained random forest models to identify individual northern cities from a sample of other northern accents, based on first two formant measurements of full vowel systems. We tested the models using unseen data. We relied on undersampling, bagging (bootstrap aggregation) and leave-one-out cross-validation to address some challenges associated with the data set, such as unbalanced data and relatively small sample size. The accuracy of classification provides us with a measure of relative similarity between different pairs of cities, while calculating conditional feature importance allows us to identify which input features (which vowels and which formants) have the largest influence in the prediction. We do find a considerable degree of leveling, especially between Manchester, Leeds and Sheffield, although some differences persist. The features that contribute to these differences most systematically are typically not the ones discussed in previous dialect descriptions. We propose that the most systematic regional features are also not salient, and as such, they serve as sociolinguistic regional indicators. We supplement the random forest results with a more traditional variationist description of by-city vowel systems, and we use both sources of evidence to inform a description of the vowels of General Northern English.
This paper demonstrates how the Y-ACCDIST system, the York ACCDIST-based automatic accent recognition system [Brown (2015). Proceedings of the International Congress of Phonetic Sciences, Glasgow, UK], can be used to inspect sociophonetic corpora as a preliminary "screening" tool. Although Y-ACCDIST's intended application is to assist with forensic casework, the system can also be exploited in sociophonetic research to begin unpacking variation. Using a subset of the PEBL (Panjabi-English in Bradford and Leicester) corpus, the outputs of Y-ACCDIST are explored, which, it is argued, efficiently and objectively assess speaker similarities across different linguistic varieties. The ways these outputs corroborate with a phonetic analysis of the data are also discovered. First, Y-ACCDIST is used to classify speakers from the corpus based on language background and region. A Y-ACCDIST cluster analysis is then implemented, which groups speakers in ways consistent with more localised networks, providing a means of identifying potential communities of practice. Additionally, the results of a Y-ACCDIST feature selection task that indicates which specific phonemes are most valuable in distinguishing between speaker groups are presented. How Y-ACCDIST outputs can be used to reinforce more traditional sociophonetic analyses and support qualitative interpretations of the data is demonstrated.
In this paper, we introduce a newly-created corpus of whispered speech simultaneously recorded via a close-talking microphone and a non-audible murmur (NAM) microphone in both clean and noisy conditions. To benchmark the corpus, which has been freely released recently, experiments on automatic recognition of continuous whispered speech were conducted. When training and test conditions are matched, the NAM microphone is found to be more robust against background noise than the close-talking microphone. In mismatched conditions (noisy data, models trained on clean speech), we found that Vector Taylor Series (VTS) compensation is particularly effective for the NAM signal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.