2018
DOI: 10.1121/1.5052438
|View full text |Cite
|
Sign up to set email alerts
|

Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields

Abstract: Vocal sound imitations provide a new challenge for understanding the coupling between articulatory mechanisms and the resulting audio. In this study, the classification of three articulatory categories, phonation, supraglottal myoelastic vibrations, and turbulence, have been modeled from audio recordings. Two data sets were assembled, consisting of different vocal imitations by four professional imitators and four non-professional speakers in two different experiments. The audio data were manually annotated by… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 28 publications
(32 reference statements)
0
10
0
Order By: Relevance
“…Some phoneticians have turned their attention to non-speech voice production, trying to identify the most relevant phonetic components that are found in vocal imitations [30]. They identified the broad categories of phonation (i.e., quasi-periodic oscillations due to vocal fold vibrations), turbulence, supraglottal myoelastic vibrations, and clicks, which can be extracted automatically from audio with time-frequency analysis and supervised [31] or unsupervised [32] machine learning. These categories can be made to correspond to categories of sounds as they are perceived [33], and as they are produced in the physical world.…”
Section: Voice As Embodied Soundmentioning
confidence: 99%
See 3 more Smart Citations
“…Some phoneticians have turned their attention to non-speech voice production, trying to identify the most relevant phonetic components that are found in vocal imitations [30]. They identified the broad categories of phonation (i.e., quasi-periodic oscillations due to vocal fold vibrations), turbulence, supraglottal myoelastic vibrations, and clicks, which can be extracted automatically from audio with time-frequency analysis and supervised [31] or unsupervised [32] machine learning. These categories can be made to correspond to categories of sounds as they are perceived [33], and as they are produced in the physical world.…”
Section: Voice As Embodied Soundmentioning
confidence: 99%
“…The extractors of the fundamental components, i.e., the measurement apparati, are implemented as signal-processing modules that are available both for analysis and, as control knobs, for synthesis. The baseline is found in the results of the SkAT-VG project [9,17,25,31,33,52], which showed that vocal imitations are optimized representations of referent sounds that emphasize those features that are important for identification. A large collection of audiovisual recordings of vocal and gestural imitations 1 offers the opportunity to further enquire how people perceive, represent, and communicate about sounds.…”
Section: Sketch Of a Quantum Vocal Theory Of Soundmentioning
confidence: 99%
See 2 more Smart Citations
“…Which are the characteristics of human voice? The utterances of humans and many mammals can be decomposed into overlapping chunks that fall within three primitive classes: phonation, turbulence, and supraglottal myoelastic vibrations [20]. In phonation, the source is in the vocal folds.…”
mentioning
confidence: 99%