Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2984
|View full text |Cite
|
Sign up to set email alerts
|

Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features

Abstract: Prominence perception has been known to correlate with a complex interplay of the acoustic features of energy, fundamental frequency, spectral tilt, and duration. The contribution and importance of each of these features in distinguishing between prominent and non-prominent units in speech is not always easy to determine, and more so, the prosodic representations that humans and automatic classifiers learn have been difficult to interpret. This work focuses on examining the acoustic prosodic representations th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 20 publications
(28 reference statements)
0
2
0
Order By: Relevance
“…Activation functions, on the other hand, control the output of DNN hidden neurons, as well as the gradient contribution during the error back-propagation process for network parameter optimization. Among others, sigmoid [24,25] and ReLU [26] are most widely used in the state-of-the-art systems such as speaker recognition [27][28][29][30] and language recognition [31,32], speech recognition [33,34], prosodic representation [35], and image processing [19,22,23].…”
Section: Introductionmentioning
confidence: 99%
“…Activation functions, on the other hand, control the output of DNN hidden neurons, as well as the gradient contribution during the error back-propagation process for network parameter optimization. Among others, sigmoid [24,25] and ReLU [26] are most widely used in the state-of-the-art systems such as speaker recognition [27][28][29][30] and language recognition [31,32], speech recognition [33,34], prosodic representation [35], and image processing [19,22,23].…”
Section: Introductionmentioning
confidence: 99%
“…Activation functions, on the other hand, control the output of DNN hidden neurons as well as the gradient contribution during the error back-propagation process for network parameter optimization. Among others, sigmoid [19], [20] and ReLU [21] are most widely used in the state-of-the-art systems such as speaker recognition [22], [23], [24], [25] and language recognition [26], [27], speech recognition [28], [29], prosodic representation [30] and image processing [14], [17], [18].…”
Section: Introduction Speaker Verification (Sv) Is An Authentication ...mentioning
confidence: 99%