An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS

Navas, Eva; Hernáez, Inmaculada; Luengo, Iker

doi:10.1109/tasl.2006.876121

Cited by 70 publications

(25 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, they tried to detect a driver's emotional status including stress level, disappointment, and euphoria, using biological signals such as facial electromyograms, electrocardiogram, respiration, and electrodermal activity. Customer service call centers equipped with speech recognition technology can assign rhythms that suit each emotion and can even generate appropriate sound signals suitable for the users' emotions (Navas, Hernaez, & Iker, 2006). As such, emotion detection technologies will heighten the quality of human-machine interaction, and users will be able to interface with their computers in a more enjoyable and useful manner.…”

mentioning

confidence: 99%

Development of an auditory emotion recognition function using psychoacoustic parameters based on the International Affective Digitized Sounds

et al. 2014

View full text Add to dashboard Cite

The purpose of this study was to develop an auditory emotion recognition function that could determine the emotion caused by sounds coming from the environment in our daily life. For this purpose, sound stimuli from the International Affective Digitized Sounds (IADS-2), a standardized database of sounds intended to evoke emotion, were selected, and four psychoacoustic parameters (i.e., loudness, sharpness, roughness, and fluctuation strength) were extracted from the sounds. Also, by using an emotion adjective scale, 140 college students were tested to measure three basic emotions (happiness, sadness, and negativity). From this discriminant analysis to predict basic emotions from the psychoacoustic parameters of sound, a discriminant function with overall discriminant accuracy of 88.9 % was produced from training data. In order to validate the discriminant function, the same four psychoacoustic parameters were extracted from 46 sound stimuli collected from another database and substituted into the discriminant function. The results showed that an overall discriminant accuracy of 63.04 % was confirmed. Our findings provide the possibility that daily-life sounds, beyond voice and music, can be used in a human-machine interface.

show abstract

mentioning

confidence: 99%

Development of an auditory emotion recognition function using psychoacoustic parameters based on the International Affective Digitized Sounds

et al. 2014

View full text Add to dashboard Cite

show abstract

“…The possible answers were the 5 styles of the corpus (see Section 2.1.2) plus the additional option of Don't know/Another (Dk/A) to avoid biasing the results in the case of confusion or doubts between two options. The risk of adding this option is that some evaluators may use it excessively to accelerate the test (Navas et al, 2006). However, this effect was negligible in this test (see right column of Table 3).…”

Section: Test Designmentioning

confidence: 94%

“…Montero et al (1998);Hozjan et al (2002); Navas et al (2006); Morrison et al (2007)). When a large speech corpus is compiled, we must make sure that all the utterances are consistent with the expressive category definition.…”

Section: Subjective Evaluationmentioning

confidence: 99%

“…To create stimulated expressive speech, the text should include good phonetic and prosodic coverage as well as a suitable semantic content which helps to express the desired styles. An increase in the number of styles makes the design of the text more difficult (Navas et al, 2006) and therefore, it is always preferable to start with a rich textual corpus.…”

Section: Theoretical Objectivesmentioning

confidence: 99%

See 1 more Smart Citation

Automatic refinement of an expressive speech corpus assembling subjective perception and automatic classification

Iriondo

Planet

Socoró

et al. 2009

Speech Communication

View full text Add to dashboard Cite

tomatic refinement of an expressive speech corpus assembling subjective perception and automatic classification. Speech Communication, Elsevier : North-Holland, 2009, 51 (9) This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ACCEPTED MANUSCRIPT AbstractThis paper presents an automatic system able to enhance expressiveness in speech corpora recorded from acted or stimulated speech. The system is trained with the results of a subjective evaluation carried out on a reduced set of the original corpus. Once the system has been trained, it is able to check the complete corpus and perform an automatic pruning of the unclear utterances, i.e. with expressive styles which are different from the intended corpus. The content which most closely matches the subjective classification remains in the resulting corpus. An expressive speech corpus in Spanish, designed and recorded for speech synthesis purposes, has been used to test the presented proposal. The automatic refinement has been applied to the whole corpus and the result has been validated with a second subjective test.

show abstract

“…The speaker-dependent approach gives much better results than the speaker-independent approach, as shown by the excellent results of Navas et al [29], where about 98% accuracy was achieved by using the Gaussian mixture model (GMM) as a classifier, with prosodic, voice quality as well as Mel frequency cepstral coefficient (MFCC) employed as speech features. However, the speaker-dependent approach is not feasible in many applications that deal with a very large number of possible users (speakers).…”

Section: Methodsmentioning

confidence: 99%

Towards an intelligent framework for multimodal affective data analysis

et al. 2015

View full text Add to dashboard Cite

An increasingly large amount of multimodal content is posted on social media websites such as YouTube and Facebook everyday. In order to cope with the growth of such so much multimodal data, there is an urgent need to develop an intelligent multi-modal analysis framework that can effectively extract information from multiple modalities. In this paper, we propose a novel multimodal information extraction agent, which infers and aggregates the semantic and affective information associated with user-generated multimodal data in contexts such as e-learning, e-health, automatic video content tagging and human-computer interaction. In particular, the developed intelligent agent adopts an ensemble feature extraction approach by exploiting the joint use of tri-modal (text, audio and video) features to enhance the multimodal information extraction process. In preliminary experiments using the eNTERFACE dataset, our proposed multi-modal system is shown to achieve an accuracy of 87.95%, outperforming the best state-of-the-art system by more than 10%, or in relative terms, a 56% reduction in error rate

show abstract

An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS

Cited by 70 publications

References 13 publications

Development of an auditory emotion recognition function using psychoacoustic parameters based on the International Affective Digitized Sounds

Development of an auditory emotion recognition function using psychoacoustic parameters based on the International Affective Digitized Sounds

Automatic refinement of an expressive speech corpus assembling subjective perception and automatic classification

Towards an intelligent framework for multimodal affective data analysis

Contact Info

Product

Resources

About