2005
DOI: 10.1121/1.1921448
|View full text |Cite
|
Sign up to set email alerts
|

Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion

Abstract: Acoustic-to-articulatory inversion is a difficult problem mainly because of the nonlinearity between the articulatory and acoustic spaces and the nonuniqueness of this relationship. To resolve this problem, we have developed an inversion method that provides a complete description of the possible solutions without excessive constraints and retrieves realistic temporal dynamics of the vocal tract shapes. We present an adaptive sampling algorithm to ensure that the acoustical resolution is almost independent of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
56
0
5

Year Published

2007
2007
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 75 publications
(62 citation statements)
references
References 27 publications
1
56
0
5
Order By: Relevance
“…An inherent shortcoming of audio-only inversion approaches is that the mapping from the acoustic to articulatory domains is one-to-many [9], in the sense that there is a large number of vocal tract configurations which can produce the same speech acoustics, and thus the inversion problem is significantly underdetermined. Incorporation of the visual modality in the speech inversion process can significantly improve inversion accuracy.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…An inherent shortcoming of audio-only inversion approaches is that the mapping from the acoustic to articulatory domains is one-to-many [9], in the sense that there is a large number of vocal tract configurations which can produce the same speech acoustics, and thus the inversion problem is significantly underdetermined. Incorporation of the visual modality in the speech inversion process can significantly improve inversion accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…Recent audio-only inversion approaches are typically based on sophisticated machine learning techniques. For example, in [9], codebooks are optimized to recover vocal tract shapes from formants, while the inversion scheme of [10] builds on neural networks. In [11], a Gaussian mixture model (GMM)-based mapping is proposed for inversion from Mel frequency cepstral coefficients (MFCCs), while a hidden Markov model (HMM)-based audio-articulatory mapping is presented in [12].…”
Section: Introductionmentioning
confidence: 99%
“…Different approaches have been proposed to achieve acoustic-to-articulatory inversion [6][7][8][9][10][11][12][13][14][15][16]. These methods rely on either explicit mapping between acoustic and articulatory data [6][7][8][9][10][11][12][13][14] or optimization of articulatory synthesis model parameters [15,16].…”
Section: Introductionmentioning
confidence: 99%
“…These methods rely on either explicit mapping between acoustic and articulatory data [6][7][8][9][10][11][12][13][14] or optimization of articulatory synthesis model parameters [15,16]. Different methods of mapping between articulatory and acoustic data have been tested using probabilistic models such as hidden Markov models (HMM) [6,7], neural networks [8,9], codebooks [10][11][12][13], or filters [14]. Except the methods that are based on the task dynamic (TD) model, most of them, however, share the common drawback of the mapping paradigm, i.e., the lack of the inclusion of speech production mechanism in the modeling process, in particular, the dynamic movement of speech gestures [1,2] that results in smooth spectral transitions observed in the natural acoustic data.…”
Section: Introductionmentioning
confidence: 99%
“…Proposed models to solve this nonlinear and non-unique mapping include neural networks [2], statistical methods [3] and codebook approaches [4] and generally rely on large data bases of human recorded articulatory-acoustic data, e.g. the MOCHA data base [5].…”
Section: Introductionmentioning
confidence: 99%