Music Genre Classification using Transfer Learning on log-based MEL Spectrogram

Mehta, Jash; Gandhi, Deep; Thakur, Govind; Kanani, Pratik

doi:10.1109/iccmc51019.2021.9418035

Cited by 18 publications

(8 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In terms of classifying music genres, one study trained CNN models with the Mel-scaled spectrogram (MSS) as a dataset, which exhibited superior performance compared to other machine-learning techniques with different data formats in previous studies [18]. The MSS is a type of spectrogram with the Mel scale on the y-axis [17].…”

Section: Music Genre Classification With Mel-scaled Spectrogrammentioning

confidence: 99%

“…In our CNN model, we strategically choose to employ the Mel-scaled spectrogram (MSS) [17] as a pivotal feature. This decision is bolstered by the MSS's proven superior performance in music genre classification tasks when used in conjunction with CNNs, highlighting its potential effectiveness for our purposes [18]. The Mel scale is specifically designed to mirror human auditory sensitivity, adeptly capturing variations in frequency and amplitude within audio signals [19].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Accelerated construction of stress relief music datasets using CNN and the Mel-scaled spectrogram

Choi,

Park,

Hong

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

Listening to music is a crucial tool for relieving stress and promoting relaxation. However, the limited options available for stress-relief music do not cater to individual preferences, compromising its effectiveness. Traditional methods of curating stress-relief music rely heavily on measuring biological responses, which is time-consuming, expensive, and requires specialized measurement devices. In this paper, a deep learning approach to solve this problem is introduced that explicitly uses convolutional neural networks and provides a more efficient and economical method for generating large datasets of stress-relief music. These datasets are composed of Mel-scaled spectrograms that include essential sound elements (such as frequency, amplitude, and waveform) that can be directly extracted from the music. The trained model demonstrated a test accuracy of 98.7%, and a clinical study indicated that the model-selected music was as effective as researcher-verified music in terms of stress-relieving capacity. This paper underlines the transformative potential of deep learning in addressing the challenge of limited music options for stress relief. More importantly, the proposed method has profound implications for music therapy because it enables a more personalized approach to stress-relief music selection, offering the potential for enhanced emotional well-being.

show abstract

Section: Music Genre Classification With Mel-scaled Spectrogrammentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Accelerated construction of stress relief music datasets using CNN and the Mel-scaled spectrogram

Choi,

Park,

Hong

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…One of the proven methods for joint time-frequency domain analysis of non-stationary sound signals is STFT [59]. The STFT spectrogram is a two-dimensional convolution of the signal and window function [72]: the X-axis represents time, the Y-axis represents frequency, and the amplitude of a particular frequency at a particular time is represented by its color in the image [73].…”

Section: Short-time Fourier Transformation (Stft)-spectrogrammentioning

confidence: 99%

Diversity Monitoring of Coexisting Birds in Urban Forests by Integrating Spectrograms and Object-Based Image Analysis

Zhao

Yan

Jin

et al. 2022

Forests

View full text Add to dashboard Cite

In the context of rapid urbanization, urban foresters are actively seeking management monitoring programs that address the challenges of urban biodiversity loss. Passive acoustic monitoring (PAM) has attracted attention because it allows for the collection of data passively, objectively, and continuously across large areas and for extended periods. However, it continues to be a difficult subject due to the massive amount of information that audio recordings contain. Most existing automated analysis methods have limitations in their application in urban areas, with unclear ecological relevance and efficacy. To better support urban forest biodiversity monitoring, we present a novel methodology for automatically extracting bird vocalizations from spectrograms of field audio recordings, integrating object-based classification. We applied this approach to acoustic data from an urban forest in Beijing and achieved an accuracy of 93.55% (±4.78%) in vocalization recognition while requiring less than ⅛ of the time needed for traditional inspection. The difference in efficiency would become more significant as the data size increases because object-based classification allows for batch processing of spectrograms. Using the extracted vocalizations, a series of acoustic and morphological features of bird-vocalization syllables (syllable feature metrics, SFMs) could be calculated to better quantify acoustic events and describe the soundscape. A significant correlation between the SFMs and biodiversity indices was found, with 57% of the variance in species richness, 41% in Shannon’s diversity index and 38% in Simpson’s diversity index being explained by SFMs. Therefore, our proposed method provides an effective complementary tool to existing automated methods for long-term urban forest biodiversity monitoring and conservation.

show abstract

“…Zeng and Tan (2021) [31] developed a large-scale pretrained model MusicBERT for four music understanding tasks, including melody completion, accompaniment suggestion, genre classification, and style classification. Mehta and Gandhi (2021) [32] compared four transfer learning architectures, Resnet34, Resnet50, VGG16, and AlexNet, for music genre classification.…”

Section: Literature Reviewmentioning

confidence: 99%

Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark

2022

View full text Add to dashboard Cite

The trend for listening to music online has greatly increased over the past decade due to the number of online musical tracks. The large music databases of music libraries that are provided by online music content distribution vendors make music streaming and downloading services more accessible to the end-user. It is essential to classify similar types of songs with an appropriate tag or index (genre) to present similar songs in a convenient way to the end-user. As the trend of online music listening continues to increase, developing multiple machine learning models to classify music genres has become a main area of research. In this research paper, a popular music dataset GTZAN which contains ten music genres is analysed to study various types of music features and audio signals. Multiple scalable machine learning algorithms supported by Apache Spark, including naïve Bayes, decision tree, logistic regression, and random forest, are investigated for the classification of music genres. The performance of these classifiers is compared, and the random forest performs as the best classifier for the classification of music genres. Apache Spark is used in this paper to reduce the computation time for machine learning predictions with no computational cost, as it focuses on parallel computation. The present work also demonstrates that the perfect combination of Apache Spark and machine learning algorithms reduces the scalability problem of the computation of machine learning predictions. Moreover, different hyperparameters of the random forest classifier are optimized to increase the performance efficiency of the classifier in the domain of music genre classification. The experimental outcome shows that the developed random forest classifier can establish a high level of performance accuracy, especially for the mislabelled, distorted GTZAN dataset. This classifier has outperformed other machine learning classifiers supported by Apache Spark in the present work. The random forest classifier manages to achieve 90% accuracy for music genre classification compared to other work in the same domain.

show abstract

Music Genre Classification using Transfer Learning on log-based MEL Spectrogram

Cited by 18 publications

References 12 publications

Accelerated construction of stress relief music datasets using CNN and the Mel-scaled spectrogram

Accelerated construction of stress relief music datasets using CNN and the Mel-scaled spectrogram

Diversity Monitoring of Coexisting Birds in Urban Forests by Integrating Spectrograms and Object-Based Image Analysis

Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark

Contact Info

Product

Resources

About