Vocalic correlates of pitch in whispered versus normal speech

Heeren, Willemijn

doi:10.1121/1.4937762

Cited by 14 publications

(11 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This might seem like a minor point, but it is worth emphasizing that human language, through speech, makes use of both voiced and voiceless sounds in all known languages. It is also the case that whispered speech, for the most part supralaryngeal, is intelligible, and there is evidence for the use of different acoustic cues in the absence of fundamental frequency [50]. Direct control of phonatory muscles-which produce voiced sounds-alone will leave a great deal unexplained.…”

Section: Box 2 Two Major Pathwaysmentioning

confidence: 99%

Vocal learning: Beyond the continuum

Martins¹,

Boeckx²

2020

PLoS Biol

View full text Add to dashboard Cite

Vocal learning is the ability to modify vocal output on the basis of experience. Traditionally, species have been classified as either displaying or lacking this ability. A recent proposal, the vocal learning continuum, recognizes the need to have a more nuanced view of this phenotype and abandon the yes-no dichotomy. However, it also limits vocal learning to production of novel calls through imitation, moreover subserved by a forebrain-to-phonatorymuscles circuit. We discuss its limitations regarding the characterization of vocal learning across species and argue for a more permissive view.

show abstract

Section: Box 2 Two Major Pathwaysmentioning

confidence: 99%

Vocal learning: Beyond the continuum

Martins¹,

Boeckx²

2020

PLoS Biol

View full text Add to dashboard Cite

show abstract

“…2) Whisper production model: To our knowledge, previous research did not study the whisper glottis and VT spectral shapes separately. Nevertheless, studies of the overall spectral envelope show that whispers present higher spectral balance and centre of gravity than speech [45]. This translates into an absence of low-frequency resonance, and reduced highfrequency tilt [46]- [48].…”

Section: A Speech and Whisper Modellingmentioning

confidence: 99%

Glottal Flow Synthesis for Whisper-to-Speech Conversion

Perrotin

McLoughlin

2020

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Whisper-to-speech conversion is motivated by laryngeal disorders, in which malfunction of the vocal folds leads to loss of voicing. Many patients with laryngeal disorders can still produce functional whispers, since these are characterised by the absence of vocal fold vibration. Whispers therefore constitute a common ground for speech rehabilitation across many kinds of laryngeal disorder. Whisper-to-speech conversion involves recreating natural-sounding speech from recorded whispers, and is a non-invasive and non-surgical rehabilitation that can maintain a natural method of speaking, unlike the existing methods of rehabilitation. This paper proposes a new rule-based method for whisper-to-speech conversion that replaces the noisy whisper sound source with a synthesised speech-like harmonic source, while maintaining the vocal tract component unaltered. In particular, a novel glottal source generator is developed in which whisper information is used to parameterise the excitation through a high-quality glottis model. Evaluation of the system against the standard pulse train excitation method reveals significantly improved performance. Since our method is glottis-based, it is potentially compatible with the many existing vocal tract component adaptation systems.

show abstract

“…In addition to the lack of voiced excitation, a number of other acoustic differences between normal and whispered speech have been reported. For instance, frequencies of the lowest three formants (F1-F3) tend to be higher in whispered speech [15,56,57] with the largest increase in F1. In [15], two other observations were made.…”

Section: Properties Of Whispered Speechmentioning

confidence: 99%

Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction

Vestman

Gowda

Sahidullah

et al. 2018

Speech Communication

View full text Add to dashboard Cite

From the available biometric technologies, automatic speaker recognition is one of the most convenient and accessible ones due to abundance of mobile devices equipped with a microphone, allowing users to be authenticated across multiple environments and devices. Speaker recognition also finds use in forensics and surveillance. Due to the acoustic mismatch induced by varied environments and devices of the same speaker, leading to increased number of identification errors, much of the research focuses on compensating for such technology-induced variations, especially using machine learning at the statistical back-end. Another much less studied but at least as detrimental source of acoustic variation, however, arises from mismatched speaking styles induced by the speaker, leading to a substantial performance drop in recognition accuracy. This is a major problem especially in forensics where perpetrators may purposefully disguise their identity by varying their speaking style. We focus on one of the most commonly used ways of disguising one's speaker identity, namely, whispering. We approach the problem of normal-whisper acoustic mismatch compensation from the viewpoint of robust feature extraction. Since whispered speech is intelligible, yet a * This work contains limited portions of [1]. This is the accepted manuscript of an article published in Speech Communication. Link to the formal publication:

show abstract

Vocalic correlates of pitch in whispered versus normal speech

Cited by 14 publications

References 39 publications

Vocal learning: Beyond the continuum

Vocal learning: Beyond the continuum

Glottal Flow Synthesis for Whisper-to-Speech Conversion

Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction

Contact Info

Product

Resources

About