As a key field in music information retrieval, music emotion recognition is indeed a challenging task. To enhance the accuracy of music emotion classification and recognition, this paper uses the idea of inception structure to use different receptive fields to extract features of different dimensions and perform compression, expansion, and recompression operations to mine more effective features and connect the timing signals in the residual network to the GRU module to extract timing features. A one-dimensional (1D) residual Convolutional Neural Network (CNN) with an improved Inception module and Gate Recurrent Unit (GRU) was presented and tested on the Soundtrack dataset. Fast Fourier Transform (FFT) was used to process the samples experimentally and determine their spectral characteristics. Compared with the shallow learning methods such as support vector machine and random forest and the deep learning method based on Visual Geometry Group (VGG) CNN proposed by Sarkar et al., the proposed deep learning method of the 1D CNN with the Inception-GRU residual structure demonstrated better performance in music emotion recognition and classification tasks, achieving an accuracy of 84%.
Abstract-Vocal music is a comprehensive art with the nature of performance which aims to construct the image of music and express feelings mainly through the human voice. Singer is the carrier of emotional expression. Therefore, singing in the right performance is very important, blindly pursuing vocal singing skills and being lack of affection can only be called "technical". The highest level of vocal singing, from the aesthetic and philosophical point of view, lies in skillful performance with sentiment -"sentiment is the basis of voice and voice the form of sentiment". That is, to achieve an artistic appeal voice, organically combining the sound and the feeling is a necessity. The emotional expression of vocal music is directly reflected in the combination of lyrics, rhythm and melody, human voice and gesture, or the integration of "word, song, voice and form". The organic integration of these four elements can constitute the highest level of vocal singing -skillful performance with sentiment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.