Glottal Flow Synthesis for Whisper-to-Speech Conversion

Perrotin, Olivier; McLoughlin, Ian

doi:10.1109/taslp.2020.2971417

Cited by 13 publications

(15 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The spectral tilt of the glottal source is computed over several glottal periods (including both glottal open and closed phases), and therefore the IAIF method does not call for the extraction of glottal closure instants (GCIs). During the past two decades, IAIF has been used in many areas, such as in parametric speech synthesis ([52] [53] [54] [55]), speaking style conversion [56], the detection of stress [57], and depression [58], as well as in emotion recognition [59] [60]. For a detailed description of the IAIF method, the reader is referred to Section II-B in the work of Raitio et al [52].…”

Section: B Baseline Featuresmentioning

confidence: 99%

The Detection of Parkinson's Disease From Speech Using Voice Source Information

Narendra

Schuller

Alku

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Section: B Baseline Featuresmentioning

confidence: 99%

The Detection of Parkinson's Disease From Speech Using Voice Source Information

Narendra

Schuller

Alku

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

“…In the presence of noise in the glottis signal, we showed that F GF captures the position of the dominant frequency region of this noise. 21 Therefore, in the control group we observe an F GF distribution around a few hundred Hertz, the order of magnitude of vocal fold vibration. The increase of F GF with the degree of impairment follows the increasing amount of noise in the glottal signal, linked to the progressive loss of phonation.…”

Section: Effect Of Subject Groupmentioning

confidence: 79%

“…Note that the GFM-IAIF glottis filter is fully causal compared to Equation 1, yet it does not affect the magnitude spectrum, from which are extracted the spectral parameters (eg, F GF , B GF , and F ST ) that we now use for analysis of dysphonic speech. GFM-IAIF has also recently been demonstrated in the conversion of postlaryngectomy speech to phonated speech, 21 lending empirical support to its effectiveness at modeling dysfunctional speech.…”

Section: Source-filter Decomposition Methodsmentioning

confidence: 94%

“…29 Close correlation between vocal effort and spectral tilt 30,31 was a major motivation for using a threepole GFM. This has been validated for voice quality analysis and modification, 32 expressive singing and speech synthesis, 33,34 whisper-to-speech conversion, 21 and in this paper is now considered for analyzing the effects of dysfunction on glottis parameters. For convenience, we will use the term "glottal formant" even for speakers who lack a glottis.…”

Section: Speech Model and Analysis General Speech Productionmentioning

confidence: 99%

“…22 It has recently been used to reconstruct a speech glottal source from analyses of whisper source, using a single decomposition model. 21 This successful extrapolation to non-speech voice sources has led − in this paper − to its application to a set of voice recordings obtained from patients who have undergone glottal or larynx treatment and surgery (including excision), alongside baseline normal speech. The analysis will demonstrate the ability of GFM-IAIF to derive parameters from voice source signals that differentiate the category of impairment present.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Automated Assessment of Glottal Dysfunction Through Unified Acoustic Voice Analysis

et al. 2022

Self Cite

View full text Add to dashboard Cite

This paper uses the recent glottal flow model for iterative adaptive inverse filtering to analyze recordings from dysfunctional speakers, namely those with larynx-related impairment such as laryngectomy. The analytical model allows extraction of the voice source spectrum, described by a compact set of parameters. This single model is used to visualize and better understand speech production characteristics across impaired and nonimpaired voices. The analysis reveals some discriminative aspects of the source model which map to a physiological class description of those impairments. Furthermore, being based on analysis of source parameters only, it is complementary to any existing techniques of vocal-tract or phonetic analysis. The results indicate the potential for future automated speech reconstruction systems that adapt to the method of reconstruction required, as well as being useful for mainstream speech systems, such as ASR, in which front-end analysis can direct back-end models to suit characteristics of impaired speech.

show abstract