Reinforcement Learning of Listener Response for Mood Classification of Audio

Stockholm, Jack; Pasquier, Philippe

doi:10.1109/cse.2009.184

Cited by 6 publications

(3 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The users can provide feedback on the system recommended audio by answering the question "Does this audio match the mood you set?" [18]. Here, the key focus is to learn the mapping of a song to the selected mood, however, in this article, we focus on the automatic determination of the emotion.…”

Section: Related Work and Backgroundmentioning

confidence: 99%

A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Rajapakshe,

Rana,

Khalifa

et al. 2021

Preprint

View full text Add to dashboard Cite

Reinforcement Learning (RL) is a semi-supervised learning paradigm where an agent learns by interacting with an environment. Deep learning in combination with RL provides an efficient method to learn how to interact with the environment called Deep Reinforcement Learning (deep RL). Deep RL has gained tremendous success in gaming -such as AlphaGo, but its potential has rarely being explored for challenging tasks like Speech Emotion Recognition (SER). Deep RL being used for SER can potentially improve the performance of an automated call centre agent by dynamically learning emotion-aware responses to customer queries. While the policy employed by the RL agent plays a major role in action selection, there is no current RL policy tailored for SER. In addition, an extended learning period is a general challenge for deep RL, which can impact the speed of learning for SER. Therefore, in this paper, we introduce a novel policy -the "Zeta policy" which is tailored for SER and apply pre-training in deep RL to achieve a faster learning rate. Pre-training with a cross dataset was also studied to discover the feasibility of pre-training the RL agent with a similar dataset in a scenario where real environmental data is not available. The IEMOCAP and SAVEE datasets were used for the evaluation with the problem being to recognise the four emotions happy, sad, angry, and neutral in the utterances provided. The experimental results show that the proposed "Zeta policy" performs better than existing policies. They also support that pre-training can reduce the training time and is robust to a cross-corpus scenario.

show abstract

Section: Related Work and Backgroundmentioning

confidence: 99%

A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Rajapakshe,

Rana,

Khalifa

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We found very few studies in audio using RL/deep RL. In [14], the authors describe an avenue of using RL to classify audio files into several mood classes depending upon listener response during a performance. In [15], the authors introduce the 'EmoRL' model that triggers an emotion classifier as soon as it gains enough confidence while listening to an emotional speech.…”

Section: Introductionmentioning

confidence: 99%

Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition

Rajapakshe,

Rana,

Latif

et al. 2019

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment. This led to breakthroughs in many complex tasks that were previously difficult to solve. However, deep RL requires a large amount of training time that makes it difficult to use in various real-life applications like human-computer interaction (HCI). Therefore, in this paper, we study pre-training in deep RL to reduce the training time and improve the performance in speech recognition, a popular application of HCI. We achieve significantly improved performance in less time on a publicly available speech command recognition dataset.

show abstract

“…Users could compete among them or collaborate to win the game. Stokholm and Pasquier [28] implemented a system mixing audio representations of the mood of several users to increase collaboration and empathy. Vinyes and colleagues developed the Audio Explorer system, enabling users to concurrently modify the audio mixing of a piece of music downloaded from the Web and to share the resulting content [29].…”

Section: Eai Endorsed Transactions Onmentioning

confidence: 99%

Social retrieval of music content in multi-user performance

Mancini¹,

Volpe²,

Varni³

et al. 2015

EAI Endorsed Transactions on Creative Technologies

View full text Add to dashboard Cite

An emerging trend in interactive music performance consists of the audience directly participating in the performance by means of mobile devices. This is a step forward with respect to concepts like active listening and collaborative music making: non-expert members of an audience are enabled to directly participate in a creative activity such as the performance. This requires the availability of technologies for capturing and analysing in real-time the natural behaviour of the users/performers, with particular reference to nonverbal expressive and social behaviour. This paper presents a prototype of a non-verbal expressive and social search engine and active listening system, enabling two teams of non-expert users to act as performers. The performance consists of real-time sonic manipulation and mixing of music pieces selected according to features characterising performers' movements captured by mobile devices. The system is described with specific reference to the SIEMPRE Podium Performance, a non-verbal socio-mobile music performance presented at the Art & ICT Exhibition that took place in Vilnius (LI) in November 2013.

show abstract

Reinforcement Learning of Listener Response for Mood Classification of Audio

Cited by 6 publications

References 12 publications

A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition

Social retrieval of music content in multi-user performance

Contact Info

Product

Resources

About