2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2021
DOI: 10.1109/iscslp49672.2021.9362098
|View full text |Cite
|
Sign up to set email alerts
|

Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…are pivotal in speech recognition, differentiating speech sounds based on their positions and transitions. Although they are not typically regarded as prosodic features, formants are instrumental in recognizing vowels and consonants, providing essential phonetic information in speech analysis [31].…”
Section: Prosodic and Phonetic Regulariser Features's Descriptionmentioning
confidence: 99%
“…are pivotal in speech recognition, differentiating speech sounds based on their positions and transitions. Although they are not typically regarded as prosodic features, formants are instrumental in recognizing vowels and consonants, providing essential phonetic information in speech analysis [31].…”
Section: Prosodic and Phonetic Regulariser Features's Descriptionmentioning
confidence: 99%
“…Each prosodic feature was then verified for the model fit considering these factors. The goodness of fit of prosodic features for fixed-effect and random-effect variables, as shown in equation (1).…”
Section: Lmm Analysismentioning
confidence: 99%
“…The prosodic features employed for emotion recognition play an essential role in the quality of the human-computer interaction that replicates human speech emotions. Supra-segmental features or the prosody features such as intensity, pitch, duration, etc., contribute additional information to speech known as paralinguistic information [1][2][3][4] and characterize the emotional speech. Developing a prosodic model for emotional utterances for less-studied languages is very challenging.…”
Section: Introductionmentioning
confidence: 99%
“…The prosodic representation is obtained as one of the learned factors, parallel with non-prosodic factors that correspond to content, speaker, channel, etc. In [12,[15][16][17], adversarial learning was applied to address the problem that the learned prosodic representation might contain substantial information related to non-prosodic factors. The use of an adversarial classifier requires the availability of the labels for one of the disentangled non-prosodic factors.…”
Section: Introductionmentioning
confidence: 99%
“…The design of the adversarial classifier is specific to only one nonprosodic factor and can not be applied to other non-prosodic factors. Furthermore, the non-prosodic factors(e.g., speaker) might be related to prosody [15], while disentangling with an adversarial classifier might also result in low prosody information in the prosodic representation.…”
Section: Introductionmentioning
confidence: 99%