Ian Calloway scite author profile

2018

Ultrasound imaging of the tongue provides detailed articulatory data for phonetic research, but current approaches require time-consuming manual labeling of tongue contours in images. Here, we present MTracker, a method for automatic identification and extraction of precise tongue contours using a convolutional neural network (CNN) in combination with the Active Contour Algorithm. Can a neural network automatically label tongue contours, with human-like levels of accuracy and consistency? About the Ultrasound Data Midsagittal ultrasound data was collected as MPEG video using a Zonare Z.One Ultrasound Unit, recording at 60fps. Human annotation used Mark Tiede's GetContours package for MAT-LAB, generating 100 point splines.

A CNN-based tool for automatic tongue contour tracking in ultrasound images

Zhu¹,

Styler²,

Calloway³

2019

Preprint

For speech research, ultrasound tongue imaging provides a noninvasive means for visualizing tongue position and movement during articulation. Extracting tongue contours from ultrasound images is a basic step in analyzing ultrasound data but this task often requires non-trivial manual annotation. This study presents an open source tool for fully automatic tracking of tongue contours in ultrasound frames using neural network based methods. We have implemented and systematically compared two convolutional neural networks, U-Net and Dense U-Net, under different conditions. Though both models can perform automatic contour tracking with comparable accuracy, Dense U-Net architecture seems more generalizable across test datasets while U-Net has faster extraction speed. Our comparison also shows that the choice of loss function and data augmentation have a greater effect on tracking performance in this task. This public available segmentation tool shows considerable promise for the automated tongue contour annotation of ultrasound images in speech research.

Bidirectional effects of priming in speech perception: Social-to-lexical and lexical-to-social

Bouavichith

Craft

et al. 2019

Previous perceptual research demonstrates that providing listeners with a social prime, such as information about a speaker's gender, can affect how listeners categorize an ambiguous speech sound produced by that speaker. We report the results of an experiment testing whether, in turn, providing listeners with a linguistic prime, such as which word they are about to hear, affects categorization of that speaker's gender. In an eye-tracking study testing for these bidirectional effects, participants (i) saw a visual prime (gender or lexical), (ii) heard an auditory stimulus drawn from a matrix of gender (female-to-male) and sibilant frequency (shack-to-sack) continua, and (iii) looked to images of the non-primed category. Social prime results replicate earlier findings that listeners’ /s-ʃ/ boundary can shift via visual gender information. Additionally, lexical prime results indicate that listeners’ judgments of speaker gender can shift with visual linguistic information. These effects are strongest for listeners at category boundaries where linguistic and social information are least prototypical. In regions of high linguistic and social prototypicality, priming effects are weakened or reversed. The results provide evidence of a bidirectional link between social and linguistic categorization in speech perception and its modulation by the stimulus prototypicality.

Power mediates the processing of gender during sibilant categorization

2021

Prior studies suggest that listeners are more likely to categorize a sibilant ranging acoustically from [∫] to [s] as /s/ if provided auditory or visual information about the speaker that suggests male gender. Social cognition can also be affected by experimentally induced differences in power. A powerful individual’s impression of another tends to show greater consistency with the other person’s broad social category, while a powerless individual’s impression is more consistent with the specific pieces of information provided about the other person. This study investigated whether sibilant categorization would be influenced by power when the listener is presented with inconsistent sources of information about speaker gender. Participants were experimentally primed for behavior consistent with powerful or powerless individuals. They then completed a forced choice identification task: They saw a visual stimulus (a male or female face) and categorized an auditory stimulus (ranging from ‘shy’ to ‘sigh’) as /∫/ or /s/. As expected, participants primed for high power were sensitive to a single cue to gender, while those who received the low power prime were sensitive to both, even if the cues did not match. This result suggests that variability in listener power may cause systematic differences in phonetic perception.

Computational modelling of category stability in segment pairs participating in perceptual asymmetry

2019

Certain unidirectional sound changes show a similarity to the laboratory phenomenon of asymmetrical misperception. The unidirectionality of these processes mirror the dissimilar confusion rates of the two segments. Despite the similarity of these processes, it is not clear what role perceptual asymmetry plays in conditioning these changes. This study employs modeling to simulate change in the characteristics of consonant pairs whose confusion rates pattern with a unidirectional sound change: /k/-to-/t/ (before /i/) and /k/-to-/p/ (before /u/). Ten native AmEng speakers were recorded producing CVC words, where the initial consonant was /p/, /t/, or /k/ and the vowel varied in height and backness. Reduced and unreduced variants were elicited. Acoustically relevant features distinguishing /k/ from /t/ and /k/ from /p/ were identified using random forests. Reduced productions of /k/ and /p/ and /k/ and /t/, respectively, show higher acoustic similarity in vocalic contexts favoring increased confusability. This tendency toward acoustic similarity is predicted to condition category convergence. However, a language learner’s category may be better informed by tokens that show less acoustic similarity to tokens of another category, predicted to condition divergence. Modelled results suggest that the stability of phonetic categories is sensitive to the relative weighting of these two forces.