Neuroimaging has revealed a core network of cortical regions that contribute to speech production, but the functional organization of this network remains poorly understood. Purpose We describe efforts to identify reliable boundaries around functionally homogenous regions within the cortical speech motor control network in order to improve the sensitivity of functional magnetic resonance imaging (fMRI) analyses of speech production and thus improve our understanding of the functional organization of speech production in the brain. Method We used a bottom-up, data-driven approach by pooling data from 12 previously conducted fMRI studies of speech production involving the production of monosyllabic and bisyllabic words and pseudowords that ranged from single vowels and consonant–vowel pairs to short sentences (163 scanning sessions, 136 unique participants, 39 different speech conditions). After preprocessing all data through the same pipeline and registering individual contrast maps to a common surface space, hierarchical clustering was applied to contrast maps randomly sampled from the pooled data set in order to identify consistent functional boundaries across subjects and tasks. Boundary completion was achieved by applying adaptive smoothing and watershed segmentation to the thresholded population-level boundary map. Hierarchical clustering was applied to the mean within–functional region of interest (fROI) response to identify networks of fROIs that respond similarly during speech. Results We identified highly reliable functional boundaries across the cortical areas involved in speech production. Boundary completion resulted in 117 fROIs in the left hemisphere and 109 in the right hemisphere. Clustering of the mean within-fROI response revealed a core sensorimotor network flanked by a speech motor planning network. The majority of the left inferior frontal gyrus clustered with the visual word form area and brain regions (e.g., anterior insula, dorsal anterior cingulate) associated with detecting salient sensory inputs and choosing the appropriate action. Conclusion The fROIs provide insight into the organization of the speech production network and a valuable tool for studying speech production in the brain by improving within-group and between-groups comparisons of speech-related brain activity. Supplemental Material https://doi.org/10.23641/asha.9402674
Tongue surface measurements from midsagittal ultrasound scans are effectively arcs with deviations representing tongue shape, but smoothing-spline analysis of variances (SSANOVAs) assume variance around a horizontal line. Therefore, calculating SSANOVA average curves of tongue traces in Cartesian Coordinates [Davidson, J. Acoust. Soc. Am. 120(1), 407-415 (2006)] creates errors that are compounded at tongue tip and root where average tongue shape deviates most from a horizontal line. This paper introduces a method for transforming data into polar coordinates similar to the technique by Mielke [J. Acoust. Soc. Am. 137(5), 2858-2869 (2015)], but using the virtual origin of a radial ultrasound transducer as the polar origin-allowing data conversion in a manner that is robust against between-subject and between-session variability.
This paper investigates the articulation of approximant /ɹ/ in New Zealand English (NZE), and tests whether the patterns documented for rhotic varieties of English hold in a non-rhotic dialect. Midsagittal ultrasound data for 62 speakers producing 13 tokens of /ɹ/ in various phonetic environments were categorized according to the taxonomy by Delattre & Freeman (1968), and semi-automatically traced and quantified using the AAA software (Articulate Instruments Ltd. 2012) and a Modified Curvature Index (MCI; Dawson, Tiede & Whalen 2016). Twenty-five NZE speakers produced tip-down /ɹ/ exclusively, 12 tip-up /ɹ/ exclusively, and 25 produced both, partially depending on context. Those speakers who produced both variants used the most tip-down /ɹ/ in front vowel contexts, the most tip-up /ɹ/ in back vowel contexts, and varying rates in low central vowel contexts. The NZE speakers produced tip-up /ɹ/ most often in word-initial position, followed by intervocalic, then coronal, and least often in velar contexts. The results indicate that the allophonic variation patterns of /ɹ/ in NZE are similar to those of American English (Mielke, Baker & Archangeli 2010, 2016). We show that MCI values can be used to facilitate /ɹ/ gesture classification; linear mixed-effects models fit on the MCI values of manually categorized tongue contours show significant differences between all but two of Delattre & Freeman's (1968) tongue types. Overall, the results support theories of modular speech motor control with articulation strategies evolving from local rather than global optimization processes, and a mechanical model of rhotic variation (see Stavness et al. 2012).
This mini review is aimed at a clinician-scientist seeking to understand the role of oscillations in neural processing and their functional relevance in speech and music perception. We present an overview of neural oscillations, methods used to study them, and their functional relevance with respect to music processing, aging, hearing loss, and disorders affecting speech and language. We first review the oscillatory frequency bands and their associations with speech and music processing. Next we describe commonly used metrics for quantifying neural oscillations, briefly touching upon the still-debated mechanisms underpinning oscillatory alignment. Following this, we highlight key findings from research on neural oscillations in speech and music perception, as well as contributions of this work to our understanding of disordered perception in clinical populations. Finally, we conclude with a look toward the future of oscillatory research in speech and music perception, including promising methods and potential avenues for future work. We note that the intention of this mini review is not to systematically review all literature on cortical tracking of speech and music. Rather, we seek to provide the clinician-scientist with foundational information that can be used to evaluate and design research studies targeting the functional role of oscillations in speech and music processing in typical and clinical populations.
Stuttering is a neurodevelopmental disorder characterized by impaired production of coordinated articulatory movements needed for fluent speech. It is currently unknown whether these abnormal production characteristics reflect disruptions to brain mechanisms underlying the acquisition and/or execution of speech motor sequences. To dissociate learning and control processes, we used a motor sequence learning paradigm to examine the behavioral and neural correlates of learning to produce novel phoneme sequences in adults who stutter (AWS) and neurotypical controls. Participants intensively practiced producing pseudowords containing non-native consonant clusters (e.g., “gvasf”) over two days. The behavioral results indicated that although the two experimental groups showed comparable learning trajectories, AWS performed significantly worse on the task prior to and after speech motor practice. Using functional magnetic resonance imaging (fMRI), the authors compared brain activity during articulation of the practiced words and a set of novel pseudowords (matched in phonetic complexity). FMRI analyses revealed no differences between AWS and controls in cortical or subcortical regions; both groups showed comparable increases in activation in left-lateralized brain areas implicated in phonological working memory and speech motor planning during production of the novel sequences compared to the practiced sequences. Moreover, activation in left-lateralized basal ganglia sites was negatively correlated with in-scanner mean disfluency in AWS. Collectively, these findings demonstrate that AWS exhibit no deficit in constructing new speech motor sequences but do show impaired execution of these sequences before and after they have been acquired and consolidated.
This paper presents the findings of an ultrasound study of 10 New Zealand English and 10 Tongan-speaking trombone players, to determine whether there is an influence of native language speech production on trombone performance. Trombone players’ midsagittal tongue shapes were recorded while reading wordlists and during sustained note productions, and tongue surface contours traced. After normalizing to account for differences in vocal tract shape and ultrasound transducer orientation, we used generalized additive mixed models (GAMMs) to estimate average tongue surface shapes used by the players from the two language groups when producing notes at different pitches and intensities, and during the production of the monophthongs in their native languages. The average midsagittal tongue contours predicted by our models show a statistically robust difference at the back of the tongue distinguishing the two groups, where the New Zealand English players display an overall more retracted tongue position; however, tongue shape during playing does not directly map onto vowel tongue shapes as prescribed by the pedagogical literature. While the New Zealand English-speaking participants employed a playing tongue shape approximating schwa and the vowel used in the word ‘lot,’ the Tongan participants used a tongue shape loosely patterning with the back vowels /o/ and /u/. We argue that these findings represent evidence for native language influence on brass instrument performance; however, this influence seems to be secondary to more basic constraints of brass playing related to airflow requirements and acoustical considerations, with the vocal tract configurations observed across both groups satisfying these conditions in different ways. Our findings furthermore provide evidence for the functional independence of various sections of the tongue and indicate that speech production, itself an acquired motor skill, can influence another skilled behavior via motor memory of vocal tract gestures forming the basis of local optimization processes to arrive at a suitable tongue shape for sustained note production.
A large number of studies have investigated the articulation of approximant /ɹ/ in American English (AE) (e.g., Delattre & Freeman, 1968). This research has found that a low third formant (F3), the main acoustic cue signaling rhoticity, can be achieved using many different tongue configurations; the two main tongue shapes used for /ɹ/ are “tip-down” (“bunched”) and “tip-up” (“retroflex”) (cf. Hagiwara, 1994). While speakers likely employ various “trading relationships” to maintain a constantly low F3 across production strategies (Guenther et al., 1999), they have access to a pool of variation, which some use to form complex and idiosyncratic patterns of allophony (Mielke et al., 2016). Such patterns may arise during speech acquisition (Magloughlin, 2016). This study focuses on a non-rhotic dialect, New Zealand English (NZE), to test whether dialect rhoticity constrains idiosyncratic allophony. Ultrasound video was collected for 63 speakers articulating 13 words containing tokens of /ɹ/ in different phonetic environments. Analysis aims to determine whether NZE speakers utilize the same tongue gestures as seen in AE, and whether they display similar patterns of allophonic variation. The data include productions from 12 children (under 10) and 13 youth (11-18), allowing examination of /ɹ/ during childhood development.
Background: This paper describes the use of real-time magnetic resonance imaging to simultaneously obtain magnetic resonance imaging (MRI) videos in both a sagittal and coronal plane during the performance of a musical exercise in five advanced trombone players. Methods: Dual-slice recordings were implemented in a frame-interleaved manner with 20 ms acquisitions per frame to achieve two interleaved videos at a rate of 25 frames per second. A customized MATLAB toolkit was used for the extraction of line profiles from MRI videos to quantify tongue movements associated with exercise performance from both perspectives. Results: Across all subjects, the analyses revealed precise coupling of vertical movements of the dorsal tongue surface (DTS), viewed from a sagittal perspective, with reduction in the vertical and horizontal dimensions of the air channel formed between the DTS and the hard palate, viewed from a coronal perspective. The cross-correlation between these movements was very strong (mean R=0.967). Conclusions: These results demonstrate the unique utility of this dual-slice technology in describing the coordination of complex tongue movements occurring in two planes (i.e., three directions) simultaneously, lending a deeper understanding of lingual motor control during trombone performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.