Long short-term memory recurrent neural network based segment features for music genre classification

Dai, Jia; Shan, Liang; Xue, Wei; Ni, Chuanfa; Liu, Wenju

doi:10.1109/iscslp.2016.7918369

Cited by 34 publications

(18 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The CNN-based approaches obtain notable results in MGC tasks; however, they neglect spectrogram temporal information, which may be useful. Based on this reasoning, the long short-term memory recurrent neural network (RNN) has been used [9] to extract features from scatter spectrograms [1] of audio segments and fuse them with those obtained using CNNs. In addition, to take advantage of both CNNs and RNNs, a convolutional RNN has been designed for music tagging [7] .…”

Section: Mgc Methodsmentioning

confidence: 99%

Client-driven animated GIF generation framework using an acoustic feature

Mujtaba

Lee

Kim

et al. 2021

Multimed Tools Appl

View full text Add to dashboard Cite

This paper proposes a novel, lightweight method to generate animated graphical interchange format images (GIFs) using the computational resources of a client device. The method analyzes an acoustic feature from the climax section of an audio file to estimate the timestamp corresponding to the maximum pitch. Further, it processes a small video segment to generate the GIF instead of processing the entire video. This makes the proposed method computationally efficient, unlike baseline approaches that use entire videos to create GIFs. The proposed method retrieves and uses the audio file and video segment so that communication and storage efficiencies are improved in the GIF generation process. Experiments on a set of 16 videos show that the proposed approach is 3.76 times more computationally efficient than a baseline method on an Nvidia Jetson TX2. Additionally, in a qualitative evaluation, the GIFs generated using the proposed method received higher overall ratings compared to those generated by the baseline method. To the best of our knowledge, this is the first technique that uses an acoustic feature in the GIF generation process.

show abstract

Section: Mgc Methodsmentioning

confidence: 99%

Client-driven animated GIF generation framework using an acoustic feature

Mujtaba

Lee

Kim

et al. 2021

Multimed Tools Appl

View full text Add to dashboard Cite

show abstract

“…Author has a tendency to conjointly demonstrate the capability of the options to capture relevant data from audio information by applying them to genre classification on the ISMIR 2004 dataset. [4] 5. Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification In the typical frame feature primarily based expressive style classification strategies, the audio information is depicted by freelance frames and therefore the serial nature of audio is completely unheeded.…”

Section: A Deep Learning Approach For Mapping Music Genresmentioning

confidence: 99%

Survey on Music Genre Recognition Using Deep Learning

2017

IJAERD

View full text Add to dashboard Cite

I INTRODUCTIONCNNs have been used extensively in solving various complicated machine learning problems such as sentiment analysis, feature extraction, genre classification and prediction. Hybrid models of CNNs and RNNs have been recently applied for temporal data like audio signals and word sequencing. Convolution Recurrent Neural Networks (CRNN's) are complex neural networks formed by combining Convolution CNN and RNN networks. CRNN architecture as a modified model of CNN with a RNN structure placed over it. This architecture has the capability to be as a robust structure to extract local feature using CNN layers and temporal summation by RNN networks. CNN's have been very popular in music recognition in diver's aspects such as automatic tagging, hybrid music recommender and feature learning. The key elements for a CNN network are: type of input signal, learning rate, activation function, batches and architecture. Mel-spectrogram is the preferred input type for music information retrieval. Mel-spectrograms consist of widespread fe1atures for tagging, boundary and onset detection, latent feature learning and it has been proved that Mel-scale is similar to the human auditory system. To achieve mel-spectogam signal, STFT (short time Fourier transform), and Log-amplitude spectrogram are required as preprocessing phase. Music feature learning with deep networks was improved with ReLu as activation function. Later this function is replaced with ELU (Exponential Linear Unit) to get fast and accurate learning. Recurrent neural networks also experienced significant improvement when gated recurrent neural network are applied. Gated RNN's have gating units which limit the flow of information through them, allowing to capture critical information from different time scales. II LITERATURE SURVEY

show abstract

“…LSTM) can grasp the prominent long-term dependency based properties, such as recurrent harmonics and music structure contained in the music. These are the possible reasons why deep learning architecture based schemes have achieved tremendous success in various MIR tasks, such as onset detection [6], emotion recognition [7], chord estimation [8], rhythm stimuli recognition [9], source separation [10], music recommendation [11] and auto-tagging [4], [12], [14], [15]. For music classification tasks, CNN and RNN are the two most adopted deep learning architectures.…”

Section: Introductionmentioning

confidence: 99%

“…To take full advantage of the complementarity between CNN and RNN in representing different aspects of music sound, some researchers proposed to construct hybrid architectures of CNN and RNN for music classification [2], [4], [12], [13]. In [13], a hybrid architecture consisting of the paralleling CNN and Bi-RNN blocks was proposed.…”

Section: Introductionmentioning

confidence: 99%

Combining CNN and Broad Learning for Music Classification

Tang

Chen

2020

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Music classification has been inspired by the remarkable success of deep learning. To enhance efficiency and ensure high performance at the same time, a hybrid architecture that combines deep learning and Broad Learning (BL) is proposed for music classification tasks. At the feature extraction stage, the Random CNN (RCNN) is adopted to analyze the Mel-spectrogram of the input music sound. Compared with conventional CNN, RCNN has more flexible structure to adapt to the variance contained in different types of music. At the prediction stage, the BL technique is introduced to enhance the prediction accuracy and reduce the training time as well. Experimental results on three benchmark datasets (GTZAN, Ballroom, and Emotion) demonstrate that: i) The proposed scheme achieves higher classification accuracy than the deep learning based one, which combines CNN and LSTM, on all three benchmark datasets. ii) Both RCNN and BL contribute to the performance improvement of the proposed scheme. iii) The introduction of BL also helps to enhance the prediction efficiency of the proposed scheme.

show abstract

Long short-term memory recurrent neural network based segment features for music genre classification

Cited by 34 publications

References 14 publications

Client-driven animated GIF generation framework using an acoustic feature

Client-driven animated GIF generation framework using an acoustic feature

Survey on Music Genre Recognition Using Deep Learning

Combining CNN and Broad Learning for Music Classification

Contact Info

Product

Resources

About