DNN Based Music Emotion Recognition from Raw Audio Signal

Orjesek, Richard; Jarina, Roman; Chmulík, Michal; Kuba, Michal

doi:10.1109/radioelek.2019.8733572

Cited by 28 publications

(20 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The spectrogram was a handcrafted magnitude-only representation without phase information. Orjesek et al [ 36 ] addressed this problem by using a raw waveform input for their classifier. Our study used both the real (magnitude) and imaginary (phase angle) information from audio for emotion classification because several studies [ 37 , 38 , 39 ] have demonstrated that phase information improves the performance of both speech and music processing.…”

Section: Related Workmentioning

confidence: 99%

Deep-Learning-Based Multimodal Emotion Classification for Music Videos

Pandeya

Bhattarai

Lee

2021

Sensors

View full text Add to dashboard Cite

Music videos contain a great deal of visual and acoustic information. Each information source within a music video influences the emotions conveyed through the audio and video, suggesting that only a multimodal approach is capable of achieving efficient affective computing. This paper presents an affective computing system that relies on music, video, and facial expression cues, making it useful for emotional analysis. We applied the audio–video information exchange and boosting methods to regularize the training process and reduced the computational costs by using a separable convolution strategy. In sum, our empirical findings are as follows: (1) Multimodal representations efficiently capture all acoustic and visual emotional clues included in each music video, (2) the computational cost of each neural network is significantly reduced by factorizing the standard 2D/3D convolution into separate channels and spatiotemporal interactions, and (3) information-sharing methods incorporated into multimodal representations are helpful in guiding individual information flow and boosting overall performance. We tested our findings across several unimodal and multimodal networks against various evaluation metrics and visual analyzers. Our best classifier attained 74% accuracy, an f1-score of 0.73, and an area under the curve score of 0.926.

show abstract

Section: Related Workmentioning

confidence: 99%

Deep-Learning-Based Multimodal Emotion Classification for Music Videos

Pandeya

Bhattarai

Lee

2021

Sensors

View full text Add to dashboard Cite

show abstract

“…Representation methods can be divided into the preparation of raw sound samples, 2D representations of music (e.g., spectrograms, cepstrograms, chromagrams, etc.) [7,29,30] and musical signal parametric form, i.e., feature vector (e.g., a vector of mel-cepstral coefficients or MPEG-7-based parameters) [14,15,17].…”

Section: Emotion Classificationmentioning

confidence: 99%

“…An approach of training a neural network with data obtained from raw, unprocessed sound was proposed by Orjesek et al [17]. The algorithm used convolutional network layers connected with layers of a recursive neural network.…”

Section: Emotion Classificationmentioning

confidence: 99%

“…To prevent the network overfitting, the dropout technique was used, which excludes some neurons from selected layers in subsequent learning iterations. Two values were returned from the last layer of the network: valence and arousal [17]. A database of musical excerpts, i.e., MediaEval Emotion, consisting of 431 samples (validation and training set) was used to train the network, each of which lasted 45 s, and the test set consisted of 58 songs with an average duration of 243 s. The capability of the NN was measured by the RMSE (Root Mean Square Error) value (see Table 1 for details).…”

Section: Emotion Classificationmentioning

confidence: 99%

“…Overall, one can see a plethora of emotion models also used in music emotion recognition (MER) [4,17,18]. However, the authors of the presented study decided to use their own description and assignment of emotions.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Classifying Emotions in Film Music—A Deep Learning Approach

et al. 2021

View full text Add to dashboard Cite

The paper presents an application for automatically classifying emotions in film music. A model of emotions is proposed, which is also associated with colors. The model created has nine emotional states, to which colors are assigned according to the color theory in film. Subjective tests are carried out to check the correctness of the assumptions behind the adopted emotion model. For that purpose, a statistical analysis of the subjective test results is performed. The application employs a deep convolutional neural network (CNN), which classifies emotions based on 30 s excerpts of music works presented to the CNN input using mel-spectrograms. Examples of classification results of the selected neural networks used to create the system are shown.

show abstract

What a deep song: The role of music features in perceived depth

et al. 2021

View full text Add to dashboard Cite

This study examines perceptions of music depth by exploring its relationships to different music features. First, a correlation analysis shows that the perceived depth of music is negatively correlated with valence and arousal and is also related to different music features, including tempo, Mel‐frequency cepstrum coefficients, chromagrams, spectral centroids, spectral bandwidth, spectral contrast, spectral flatness, spectral roll‐off, and tonal centroid features. Applying machine learning methods, we find that selected music features can predict perceptions of music depth, and a random forest regression (RFR) is found to perform best in this study. Finally, a feature importance analysis shows that the principal component of spectral contrast dominates the RFR‐based music depth recognition model, showing that deep music usually has clear and narrow‐band audio signals.

show abstract

DNN Based Music Emotion Recognition from Raw Audio Signal

Cited by 28 publications

References 10 publications

Deep-Learning-Based Multimodal Emotion Classification for Music Videos

Deep-Learning-Based Multimodal Emotion Classification for Music Videos

Classifying Emotions in Film Music—A Deep Learning Approach

What a deep song: The role of music features in perceived depth

Contact Info

Product

Resources

About