We present a method that determines articulatory movements from speech acoustics using a Hidden Markov Model (HMM)-based speech production model. The model statistically generates speech spectrum and articulatory parameters from a given phonemic string. It consists of HMMs of articulatory parameters for each phoneme and an articulatory-to-acoustic mapping for each HMM state. For a given speech spectrum, maximum a posteriori estimation of the articulatory parameters of the statistical model is presented. The performance on sentences was evaluated by comparing the estimated articulatory parameters with the observed parameters. The average RMS errors of the estimated articulatory parameters were 1.50 mm from the speech acoustics and the phonemic information in an utterance and 1.73 mm from the speech acoustics only.Index Terms-Articulatory HMM, articulatory-to-acoustic mapping, HMM-based speech production model, speech inversion.
Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener’s own speech action and the effects of viewing another’s speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another’s mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.