Emotional quantification of soundscapes by learning between samples

Ntalampiras, Stavros

doi:10.1007/s11042-020-09430-3

Cited by 6 publications

(7 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent studies with EMO have shown that more sophisticated nonlinear models (such as RF) can reach good scores with 15 features for arousal (MSE ≈ 0.050) and 14 features for valence (MSE ≈ 0.140). Finally, other authors using other complex nonlinear models, such as CNNs and data augmentation techniques, obtain slightly better metrics (MSE ≈ 0.035 for arousal, and MSE ≈ 0.078 for valence), but also including substantially more variables in their models: from 23 up to 54 features [11], [31]. All these considerations confirm the quality of our suggested models.…”

Section: B Selection Of the Number Of Variables And Suggested Model F...supporting

confidence: 76%

“…In [1], a fine-tuned RF model with 14 features overcomes the previous RF model, and convolution neural networks (CNNs). Deep learning techniques have been also applied to SER through CNN and 23 simplified mel-frequency cepstral coefficients (MFCC) in [31], and the combination with SVM (Transfer learning) in [11]. Promising results use up to 54 features by heuristic methods despite the limited samples of EMO.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Exhaustive Variable Selection Study for Linear Models of Soundscape Emotions: Rankings and Gibbs Analysis

Millán-Castillo

Martino

Morgado

et al. 2022

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

In the last decade, soundscapes have become one of the most active topics in Acoustics, providing a holistic approach to the acoustic environment, which involves human perception and context. Soundscapes-elicited emotions are central and substantially subtle and unnoticed (compared to speech or music). Currently, soundscape emotion recognition is a very active topic in the literature. We provide an exhaustive variable selection study (i.e., a selection of the soundscapes indicators) to a well-known dataset (emo-soundscapes). We consider linear soundscape emotion models for two soundscapes descriptors: arousal and valence. Several ranking schemes and procedures for selecting the number of variables are applied. We have also performed an alternating optimization scheme for obtaining the best sequences keeping fixed a certain number of features. Furthermore, we have designed a novel technique based on Gibbs sampling, which provides a more complete and clear view of the relevance of each variable. Finally, we have also compared our results with the analysis obtained by the classical methods based on p-values. As a result of our study, we suggest two simple and parsimonious linear models of only 7 and 16 variables (within the 122 possible features) for the two outputs (arousal and valence), respectively. The suggested linear models provide very good and competitive performance, with R 2 > 0.86 and R 2 > 0.63 (values obtained after a cross-validation procedure), respectively.

show abstract

Section: B Selection Of the Number Of Variables And Suggested Model F...supporting

confidence: 76%

Section: Introductionmentioning

confidence: 99%

An Exhaustive Variable Selection Study for Linear Models of Soundscape Emotions: Rankings and Gibbs Analysis

Millán-Castillo

Martino

Morgado

et al. 2022

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…The authors used two sets of techniques to extract features. The first method used a pretrained deep neural network created by S. Hershey et al [50], whereas the second method involved 54 features extracted using MIRToolbox and YAAFE. The best performance for arousal was reported with the CNN with an R 2 and MSE of 0.832 and 0.035, respectively, whereas the best performance for valence was reported to have an R 2 of 0.759 and MSE of 0.078 via VGGish (a deep CNN model).…”

Section: Sound Emotion Recognitionmentioning

confidence: 99%

“…-n_estimators : (50,100,150,200,250,300), number of trees in the forests; -max_depth : (5,10,20,30,50), maximum number of levels in each decision tree; -min_samples_split : (2, 3, 4, 5, 6, 7), minimum number of data points placed in a node before the node is split; -min_samples_lea f : (1, 2, 3, 5), minimum number of data points allowed in a leaf node; -k : range(1, 68), number of features selected using RFE with the RF estimator.…”

Section: Hyper-parameter Tuningmentioning

confidence: 99%

A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification

et al. 2021

View full text Add to dashboard Cite

Sonification is the utilization of sounds to convey information about data or events. There are two types of emotions associated with sounds: (1) “perceived” emotions, in which listeners recognize the emotions expressed by the sound, and (2) “induced” emotions, in which listeners feel emotions induced by the sound. Although listeners may widely agree on the perceived emotion for a given sound, they often do not agree about the induced emotion of a given sound, so it is difficult to model induced emotions. This paper describes the development of several machine and deep learning models that predict the perceived and induced emotions associated with certain sounds, and it analyzes and compares the accuracy of those predictions. The results revealed that models built for predicting perceived emotions are more accurate than ones built for predicting induced emotions. However, the gap in predictive power between such models can be narrowed substantially through the optimization of the machine and deep learning models. This research has several applications in automated configurations of hardware devices and their integration with software components in the context of the Internet of Things, for which security is of utmost importance.

show abstract

“…Ntalampiras [15] provided a comparison between emotion prediction from singleton soundscapes and mixed soundscapes using a CNN model. The author used Emo-Soundscape dataset and extracted the features from sound samples using log-Mel spectrum [16] which is a spectrogram that the frequencies are converted to the Mel scale.…”

Section: Related Workmentioning

confidence: 99%

Predicting Emotions Perceived from Sounds

Abri

Gutiérrez

Namin

et al. 2020

2020 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

Sonification is the science of communication of data and events to users through sounds. Auditory icons, earcons, and speech are the common auditory display schemes utilized in sonification, or more specifically in the use of audio to convey information. Once the captured data are perceived, their meanings, and more importantly, intentions can be interpreted more easily and thus can be employed as a complement to visualization techniques. Through auditory perception it is possible to convey information

show abstract

Emotional quantification of soundscapes by learning between samples

Cited by 6 publications

References 21 publications

An Exhaustive Variable Selection Study for Linear Models of Soundscape Emotions: Rankings and Gibbs Analysis

An Exhaustive Variable Selection Study for Linear Models of Soundscape Emotions: Rankings and Gibbs Analysis

A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification

Predicting Emotions Perceived from Sounds

Contact Info

Product

Resources

About