Supervised machine learning for audio emotion recognition

Cunningham, Stuart; Ridley, Harrison; Weinel, Jonathan; Picking, Richard

doi:10.1007/s00779-020-01389-0

Cited by 39 publications

(23 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ntalampiras [52] compared emotion prediction using two CNNs that were designed to individually predict arousal and valence. The authors used the EmoSoundscapes data.…”

Section: Sound Emotion Recognitionmentioning

confidence: 99%

See 1 more Smart Citation

A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification

et al. 2021

View full text Add to dashboard Cite

Sonification is the utilization of sounds to convey information about data or events. There are two types of emotions associated with sounds: (1) “perceived” emotions, in which listeners recognize the emotions expressed by the sound, and (2) “induced” emotions, in which listeners feel emotions induced by the sound. Although listeners may widely agree on the perceived emotion for a given sound, they often do not agree about the induced emotion of a given sound, so it is difficult to model induced emotions. This paper describes the development of several machine and deep learning models that predict the perceived and induced emotions associated with certain sounds, and it analyzes and compares the accuracy of those predictions. The results revealed that models built for predicting perceived emotions are more accurate than ones built for predicting induced emotions. However, the gap in predictive power between such models can be narrowed substantially through the optimization of the machine and deep learning models. This research has several applications in automated configurations of hardware devices and their integration with software components in the context of the Internet of Things, for which security is of utmost importance.

show abstract

“…Ntalampiras [52] compared emotion prediction using two CNNs that were designed to individually predict arousal and valence. The authors used the EmoSoundscapes data.…”

Section: Sound Emotion Recognitionmentioning

confidence: 99%

“…Part of the work performed by Ntalampiras [52] was on the EmoSoundscape dataset, and they used CNN models. The MSE values reported for arousal and valence prediction were around 0.049 and 0.11, respectively, which are equivalent to 0.22 and 0.33 for RMSE.…”

Section: Performance Of Prediction Modelsmentioning

confidence: 99%

A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification

et al. 2021

View full text Add to dashboard Cite

show abstract

“…23,24 Using properties of sound as features and experience measures as labels for those features, several groups have attempted to build machine learning algorithms that can predict emotional responses based on the sound properties alone, commonly according to a valence-arousal circumplex model. 25,26,27,28,29 However results in this area remain mixed due to lack of sufficiently high dimensional measurement and modeling tools suitable for capturing the fast changes in human experience that accompany changes in sound. 30,31…”

Section: A Effects Of Sound On Human Experiencementioning

confidence: 99%

Modeling The Effect of Background Sounds on Human Focus Using Brain Decoding Technology

Haruvi

Kopito

Brande-Eilat

et al. 2021

Preprint

View full text Add to dashboard Cite

The goal of this study was to learn what properties of sound affect human focus the most. Participants (N=62, 18-65y) performed various tasks while listening to either no background sound (silence), popular music playlists for increasing focus (pre-recorded songs), or personalized soundscapes (audio composed in real-time to increase a specific individual's focus). While performing tasks on a tablet, participants wore headphones and brain signals were recorded using a portable electroencephalography headband. Participants completed four one-hour long sessions, each with different audio content, at home. We successfully generated brain-based models to predict individual participant focus levels over time and used these models to analyze the effects of various audio content during different tasks. We found that while participants were working, personalized soundscapes increased their focus significantly above silence (p=0.008), while music playlists did not have a significant effect. For the young adult demographic (18-36y), silence was significantly less effective at producing focus than audio content of any type tested (p=0.001-0.009). Personalized soundscapes enhanced focus the most relative to silence, but professionally crafted playlists of pre-recorded songs also increased focus during specific time intervals, especially for the youngest audience demographic. We also found that focus levels can be predicted from physical properties of sound, enabling human and artificial intelligence composers to test and refine audio to produce increases or decreases in listener focus with high temporal (millisecond) precision. Future research includes real-time adjustment of sound for other functional objectives, such as affecting listener enjoyment, calm, or memory.

show abstract

“…Upon connecting all three lasers, a green LED lights up, and a 'success' sound is emitted, which gives positive emotional feedback to the user through an ascending harmonic sequence of tones. This emotional feedback can be understood as an affective sound design (see Cunningham et al 2020).…”

Section: Laser Puzzlementioning

confidence: 99%

Worship the Penguin: Adventures with sprites, chiptunes, and lasers

Weinel

2021

Electronic Workshops in Computing

Self Cite

View full text Add to dashboard Cite

This paper provides a review of recent projects developed through the author's creative practice and activities across multiple computing and games technologies platforms. These include: a 2D game project made in Unity; an Arduino-based laser puzzle; chiptune breakbeat music made on a Commodore 64; the archival of a collection of Amiga demoscene disks; PETSCII graphics; a controller adapter for the Amiga; and a DJ/VJ performance. While playfully exploring new trajectories, these projects broadly reflect on-going themes present in the author's previous work, such as explorations of the aesthetic paradigms presented by vintage computers, 1990s rave culture, and synaesthesia. The paper will address the various challenges and methodologies used to realise these projects; pedagogical considerations; and the pandemic context in which they have been created and presented.

show abstract

Supervised machine learning for audio emotion recognition

Cited by 39 publications

References 50 publications

A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification

A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification

Modeling The Effect of Background Sounds on Human Focus Using Brain Decoding Technology

Worship the Penguin: Adventures with sprites, chiptunes, and lasers

Contact Info

Product

Resources

About