2019
DOI: 10.3390/app9214535
|View full text |Cite
|
Sign up to set email alerts
|

Glottal Source Contribution to Higher Order Modes in the Finite Element Synthesis of Vowels

Abstract: Articulatory speech synthesis has long been based on one-dimensional (1D) approaches. They assume plane wave propagation within the vocal tract and disregard higher order modes that typically appear above 5 kHz. However, such modes may be relevant in obtaining a more natural voice, especially for phonation types with significant high frequency energy (HFE) content. This work studies the contribution of the glottal source at high frequencies in the 3D numerical synthesis of vowels. The spoken vocal range is exp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0

Year Published

2020
2020
2025
2025

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 23 publications
0
4
0
Order By: Relevance
“…Moreover, the similarities of each version of the [a], [i] and [u] vowels obtained from one of the three possible configurations with respect to the expressive target configuration were measured as the symmetrical Kullback-Leibler spectral distance [43] between their long term average spectra (LTAS) and the corresponding one of their expressive target pairs, i.e., d KL (GSS X VT Y , GSS E VT E ). To do so, LTAS were computed as the Welch's power spectral estimate, considering a 15-ms Hamming window with 50% overlap and a 2048-point FFT [30].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, the similarities of each version of the [a], [i] and [u] vowels obtained from one of the three possible configurations with respect to the expressive target configuration were measured as the symmetrical Kullback-Leibler spectral distance [43] between their long term average spectra (LTAS) and the corresponding one of their expressive target pairs, i.e., d KL (GSS X VT Y , GSS E VT E ). To do so, LTAS were computed as the Welch's power spectral estimate, considering a 15-ms Hamming window with 50% overlap and a 2048-point FFT [30].…”
Section: Methodsmentioning
confidence: 99%
“…During the last decade, and thanks to the significant increase in computing power, several attempts have been developed to overcome this limitation by means of three-dimensional (3D) vocal tract models [27][28][29], which allow the propagation of higher order modes [26]. This characteristic is especially relevant for the production of vowels with a tense phonation [30]. So far, such models have been already used for the synthesis of vowels [29], diphthongs [31] and vowel-consonant-vowel sequences [32], including sibilants [33] that entail considering aeroacoustic sources [34], as well as works focused on tuning the vocal tract resonances [35] to be able to simulate effects such as the so-called singing formant [36].…”
Section: Introductionmentioning
confidence: 99%
“…Speech synthesis research for the several Iberian languages has a long tradition, with systems being developed based both in state-of-the-art technologies as well on methods closer to human speech production. For this special issue, very good representatives of both lines of research were selected: one using recent neural architectures for Linguistic-Acoustic Mapping [8] and the other exploring finite element synthesis of vowels [9].…”
Section: Speech Production and Synthesismentioning
confidence: 99%
“…The second paper of this section is also related to acoustic modeling for speech synthesis, but this time the model is directly inspired by human speech production structures and physical phenomena and only for vowels. The paper, by Freixes and coworkers [9], adopts realistic 3D acoustic models to include the higher order models of propagation that typically appear above 5 kHz to study the changes in high frequency. Their work is motivated by the limitation of 1D approaches to planar propagation and the little attention of previous research to the high frequency range.…”
Section: Speech Production and Synthesismentioning
confidence: 99%
“…In this work we want to explore the effects that the pacemaker might have on the numerical generation of vowel /a/. For this purpose, we compute the 3D vocal tract (VT) impulse response of /a/ by solving the mixed-form wave equation using a stabilized finite element method (FEM), and convolve it with the volume flow velocity computed from the two-mass model (see e.g., [7][8][9] for vowels and [10][11][12] for diphthongs).…”
Section: Introductionmentioning
confidence: 99%