2014
DOI: 10.1016/j.csl.2014.03.002
|View full text |Cite
|
Sign up to set email alerts
|

Data-driven detection and analysis of the patterns of creaky voice

Abstract: This paper investigates the temporal excitation patterns of creaky voice. Creaky voice is a voice quality frequently used as a phrase-boundary marker, but also as a means of portraying attitude, affective states and even social status. Consequently, the automatic detection and modelling of creaky voice may have implications for speech technology applications. The acoustic characteristics of creaky voice are, however, rather distinct from modal phonation. Further, several acoustic patterns can bring about the p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
27
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 40 publications
(28 citation statements)
references
References 32 publications
1
27
0
Order By: Relevance
“…It turns out that speakers in the TTS and Control sets have a creaky usage never exceeding 10%. This goes in line with the findings in (Drugman et al, 2013a) drawn over a range of languages (US English, Japanese, Swedish and Finnish) where creaky voice was used between 3.5 and 10.5% of the time (as extracted from manual annotations). The artefacts in TE speech however lead to a perception of creakiness largely exceeding 10% for about 3 patients over 4.…”
Section: Gargling Noise/creakinesssupporting
confidence: 72%
“…It turns out that speakers in the TTS and Control sets have a creaky usage never exceeding 10%. This goes in line with the findings in (Drugman et al, 2013a) drawn over a range of languages (US English, Japanese, Swedish and Finnish) where creaky voice was used between 3.5 and 10.5% of the time (as extracted from manual annotations). The artefacts in TE speech however lead to a perception of creakiness largely exceeding 10% for about 3 patients over 4.…”
Section: Gargling Noise/creakinesssupporting
confidence: 72%
“…To model voiced and unvoiced time-frequency regions, most vocoders rely on an f 0 estimate and/or voicing detection that assume voiced segments to have sinusoidal content. However, various segments of the speech signal are voiced with very non-periodical characteristic of the pulse's position, called creakiness in this presentation, as in creaky voice phonatory mode [36] and sometimes in transients. Thus, the corresponding sinusoidal content in these segments is highly disturbed and often wrongly classified as unvoiced segments, leading to hoarseness and noisy transients in the synthesized voice.…”
Section: B Mask Correction For Creakinessmentioning
confidence: 99%
“…Recent work on non-modal phonation focuses on detection (Drugman et al, 2014), analysis (Malyska, 2008;Malyska et al, 2011) and synthesis (Bangayan et al, 1997) of speech with non-modal phonation. Modern computational paralinguistics tries to 1) get rid of non-modal phonation, or 2) model it, for example, for classification purposes (Schuller and Batliner, 2013).…”
mentioning
confidence: 99%