Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-315
|View full text |Cite
|
Sign up to set email alerts
|

Testing Acoustic Voice Quality Classification Across Languages and Speech Styles

Abstract: Many studies relate acoustic voice quality measures to perceptual classification. We extend this line of research by training a classifier on a balanced set of perceptually annotated voice quality categories with high inter-rater agreement, and test it on speech samples from a different language and on a different speech style. Annotations were done on continuous speech from different laboratory settings. In Experiment 1, we trained a random forest with Standard Chinese and German recordings labelled as modal,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…1 Previous research comparing VQ in German RQs and ISQs showed that given the variability of the experimental material, a perceptual analysis was more robust than acoustic measures (Braun et al 2019). Data from 11 female speakers of the current study was also classified in a random forest model (model included measures of HNR, H1-A2, H1-A3, H1*-A3*, cepstral-peak prominence, glottal to noise excitation ratio, and b1 and b2 measurements); the accuracy of the model was 78.6% overall (Braun et al 2021).…”
Section: Notementioning
confidence: 85%
See 1 more Smart Citation
“…1 Previous research comparing VQ in German RQs and ISQs showed that given the variability of the experimental material, a perceptual analysis was more robust than acoustic measures (Braun et al 2019). Data from 11 female speakers of the current study was also classified in a random forest model (model included measures of HNR, H1-A2, H1-A3, H1*-A3*, cepstral-peak prominence, glottal to noise excitation ratio, and b1 and b2 measurements); the accuracy of the model was 78.6% overall (Braun et al 2021).…”
Section: Notementioning
confidence: 85%
“…/pɔr/ in borðar), (ii) the initial, stressed syllable of the subject einhver (/e ͡ in/ in /ˈe ͡ ͡ in.k h vεr/ 'anybody'), (iii) the stressed syllable of the object noun, and (iv) the offset of the sentence. Three types of VQ were perceptually classified: modal (neutral mode of phonation, Laver 1980), breathy (audible friction of the air) and glottalized (low frequency irregular vocal fold vibrations, Braun et al 2021). 1 Speaking rate was operationalized as the number of syllables per second.…”
Section: Methodsmentioning
confidence: 99%