2009
DOI: 10.1016/j.specom.2008.09.006
|View full text |Cite
|
Sign up to set email alerts
|

Data-driven emotion conversion in spoken English

Abstract: This paper describes an emotion conversion system that combines independent parameter transformation techniques to endow a neutral utterance with a desired target emotion. A set of prosody conversion methods have been developed which utilise a small amount of expressive training data ($15 min) and which have been evaluated for three target emotions: anger, surprise and sadness. The system performs F0 conversion at the syllable level while duration conversion takes place at the phone level using a set of lingui… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
57
0

Year Published

2010
2010
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 80 publications
(58 citation statements)
references
References 25 publications
(12 reference statements)
0
57
0
Order By: Relevance
“…To develop this technique, we need a deep understanding of how to effectively factorize speech acoustics into its individual components such as linguistic, non-linguistic, and para-linguistic information using various technologies, such as speech analysis, speech synthesis, acoustic modeling, and machine learning. Moreover, VC has great potential to develop various applications not only for flexible control of speaker identity of synthetic speech in textto-speech (TTS) [1] but also as a speaking aid for vocally handicapped people such as dysarthric patients [2] and laryngectomees [3], as a voice changer to flexibly generate various types of emotional [4] and expressive speech [5], for vocal effects to produce more varieties of singing voices [6,7], for enhanced mobile speech communication using wideband speech [8] and silent speech [9], accent conversion for computer assisted language learning [10], and so on. Therefore, it is worthwhile to study this technique for both scientific purposes and industrial applications.…”
Section: Introductionmentioning
confidence: 99%
“…To develop this technique, we need a deep understanding of how to effectively factorize speech acoustics into its individual components such as linguistic, non-linguistic, and para-linguistic information using various technologies, such as speech analysis, speech synthesis, acoustic modeling, and machine learning. Moreover, VC has great potential to develop various applications not only for flexible control of speaker identity of synthetic speech in textto-speech (TTS) [1] but also as a speaking aid for vocally handicapped people such as dysarthric patients [2] and laryngectomees [3], as a voice changer to flexibly generate various types of emotional [4] and expressive speech [5], for vocal effects to produce more varieties of singing voices [6,7], for enhanced mobile speech communication using wideband speech [8] and silent speech [9], accent conversion for computer assisted language learning [10], and so on. Therefore, it is worthwhile to study this technique for both scientific purposes and industrial applications.…”
Section: Introductionmentioning
confidence: 99%
“…This is also based on the singing-to-singing synthesis approach and is an extension of VocaListener, which deals with only pitch and dynamics. Much previous work has been done on manipulating voice timbre such as speaking voice conversion [12,13], emotional speech synthesis [14][15][16], singing voice conversion [17], and singing voice morphing [18]. However, these approaches cannot deal with intentional temporal timbre changes during singing.…”
Section: Vocalistener2: Singing Synthesis System Imitating Voice Timbmentioning
confidence: 99%
“…Therefore, this area of study calls for advanced technologies both in signal processing and machine learning. Speaking style conversion is related to other areas of speech technology such as statistical parametric speech synthesis (SPSS) [2], voice conversion (VC) [3], emotional voice conversion [4,5] and speech intelligibility enhancement [6]. The topic can, however, be considered as a research area of its own because it differs from all the above areas: There is, for example, no linguistic-to-acoustic mapping as in speech synthesis and the conversion is not constrained by a strict latency requirement as in speech intelligibility enhancement.…”
Section: Introductionmentioning
confidence: 99%