2023
DOI: 10.11591/eei.v12i1.4231
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal music emotion recognition in Indonesian songs based on CNN-LSTM, XLNet transformers

Abstract: Music carries emotional information and allows the listener to feel the emotions contained in the music. This study proposes a multimodal music emotion recognition (MER) system using Indonesian song and lyrics data. In the proposed multimodal system, the audio data will use the mel spectrogram feature, and the lyrics feature will be extracted by going through the tokenizing process from XLNet. Convolutional long short term memory network (CNN-LSTM) performs the audio classification task, while XLNet transforme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 31 publications
1
2
0
Order By: Relevance
“…Sams et al used convolutional LSTM networks to perform audio classification tasks. The results suggested that the multimodal method for music emotion recognition performed better than the single-modal method (Sams & Zahra, 2023). This is consistent with the research on music emotion and visualization of the fusion of LSTM networks supported by the IoT in this paper.…”
Section: E Discussionsupporting
confidence: 90%
“…Sams et al used convolutional LSTM networks to perform audio classification tasks. The results suggested that the multimodal method for music emotion recognition performed better than the single-modal method (Sams & Zahra, 2023). This is consistent with the research on music emotion and visualization of the fusion of LSTM networks supported by the IoT in this paper.…”
Section: E Discussionsupporting
confidence: 90%
“…The models are simple in design and intended to be supplementary performance benchmarks on the dataset. In future work, more state-of-the-art methods such as convolutional neural networks [ 70 , 71 , 72 , 73 , 74 , 75 , 76 ], or transformer architectures [ 60 , 77 , 78 ] could be used with the dataset for further experimentation with profile information and its uses for building improved MER models.…”
Section: Emotion Prediction Modelsmentioning
confidence: 99%
“…Long short-term memory (LSTM), gate repeating unit (GRU), autoencoder (AE), and convolutional neural network (CNN) are some examples of deep learning algorithms for classification. Long short-term memory (LSTM) is one of the deep learning algorithms that can be used for classification, prediction, and control [8][9][10][11]. It can learn complex patterns and relationships in the input data, making it a valuable tool for a wide range of tasks in various fields such as finance, healthcare, and natural language processing [12][13][14].…”
Section: Introductionmentioning
confidence: 99%