Sound Classification Based on Multihead Attention and Support Vector Machine

Yang, Lei; Zhao, Hongdong

doi:10.1155/2021/9937383

Cited by 8 publications

(7 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The accuracy of the model can be defined by the formula given below: 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 ( 6 ) 3…”

Section: Discussionmentioning

confidence: 99%

“…Previously, researchers used mathematical techniques like standard statistical pattern recognition (SPR), Gaussian classifier (GS), and Gaussian Mixture Model (GMM) to classify music genres. In the age of AI, researchers have been using various techniques of Machine learning like Multi-Class Support Vector Machine (SVM) [6], K-Nearest Neighbors (KNN) [7], Linear Kernel SVM, Polynomial Kernel SVM, Decision Tree, Random Forest, Ada Boost, Naïve Bayes, Linear Discriminant Analysis (LDA) classifier, Logistic Regression, and Sigmoid Kernel SVM [6], [8]- [10]. In modern days, deep learning is not only solving the problem of computer vision but also dealing with sequencing and time series problems [11][12].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Classification of Indian Classical Music (Hindustani Music) Genres through MFCCs Features using RNN-LSTM Model

Bisht

Negi

Singh

2022

Preprint

View full text Add to dashboard Cite

Music has been considered an inseparable part of our culture and tradition. In this work, we created a dataset with six Hindustani music genres: Abhang, Bhajan, Thumri, Tappa, Ghazal, and Kajri, each of which contains 100 songs in wave(.wav) format. To classify the Hindustani music genres, we employ the mel frequency ceptral coefficients features, which contain timbral information, and the Recurrent Neural Network-Long Short Term Memory. Our best three models achieved an average accuracy of 86% when trained on various feature sets with MFCC values of 18, 26, and 39. Furthermore, we use uniform manifold approximation and projection to transform and visualise higher-dimensional feature set data into two-dimensional space. Based on the results, we can infer that Hindustani music has more intricate melodies than western music, and feeding 18 MFCC features to the deep neural network is the optimum strategy to obtain better accuracy. Increasing the hop length from 512 to 1024 reduces the input dimension size, which facilitates the RNN-LSTM model. As a result, the performance of the RNN-LSTM models has been slightly improved. Our RNN-LSTM models’ test set accuracy decreased by 5% when we took 5 segments. Additionally, we evaluated and compared our model to six genres of the GTZAN dataset and achieved 90% accuracy.

show abstract

“…The accuracy of the model can be defined by the formula given below: 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 ( 6 ) 3…”

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Classification of Indian Classical Music (Hindustani Music) Genres through MFCCs Features using RNN-LSTM Model

Bisht

Negi

Singh

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Returning to the classification of sound/noise, which can be classically used as the base for the detection danger from the noise around an individual/population, [15] demonstrates that sound categorization performance can still be improved by swapping out the recurrent architecture for a parallel processing structure during feature extraction. The research processes the huge data and uses it to develop the model using Deep Learning Algorithms, namely CNN (Convolutional Neural Networks) and LSTM (Long Short-Term Memory).…”

Section: Technical Backgroundmentioning

confidence: 99%

“…In this research, the audio analysis technique adopted is the Fourier Transform and Mel-Spectrogram (similar to [31]), and the audio was sampled at 44.1kHz (just like in [32]) for further processing. Post-cleaning, the sound data is subjected to three different deep learning models (1D-CNN, 2D-CNN, and LSTM) for the classification of sound from a person's surroundings (the likes of which have been used in various pieces of research cited above, for example: [15]), and to detect a threat from it. If the threat is detected, then an automatic alert message is sent to the registered help or the emergency services.…”

Section: Technical Backgroundmentioning

confidence: 99%

Live Event Detection for People’s Safety Using NLP and Deep Learning

Sen,

Rajakumaran,

Mahdal

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Today, humans pose the greatest threat to society by getting involved in robbery, assault, or homicide activities. Such circumstances threaten the people working alone at night in remote areas especially women. Any such kind of threat in real time is always associated with a sound/noise which may be used for an early detection. Numerous existing measures are available but none of them sounds efficient due to lack of accuracy, delays in exact prediction of threat. Hence a novel software-based prototype is developed to detect threats from a person's surrounding sound/noise and automatically alert the registered contacts of victims by sending email, SMS, WhatsApp messages through their smartphones without any other hardware components. Audio signals from Kaggle dataset are visualized, analyzed using Exploratory Data Analytics (EDA) techniques. By feeding EDA outcomes into various Deep Learning models: Long short-term memory (LSTM), Convolutional Neural Networks (CNN) yields accuracy of 96.6% in classifying the audio-events.

show abstract

“…In the aspect of audio recognition, Yang and Zhao [ 17 ] proposed an acoustic scene classification method based on the support vector machine (SVM), which enhanced the sound texture to improve the classification accuracy. Greco et al [ 18 ] proposed a voice recognition system based on the heuristic deep learning method.…”

Section: Related Workmentioning

confidence: 99%

Research on Audio Recognition Based on the Deep Neural Network in Music Teaching

Cui

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Solfeggio is an important basic course for music majors, and audio recognition training is one of the important links. With the improvement of computer performance, audio recognition has been widely used in smart wearable devices. In recent years, the development of deep learning has accelerated the research process of audio recognition. However, there is a lot of sound interference in music teaching environment, which leads to the performance of the audio classifier that cannot meet the actual demand. In order to solve this problem, an improved audio recognition system based on YOLO-v4 is proposed, which mainly improves the network structure. First, Mel frequency cepstrum number is used to process the original audio and extract the corresponding features. Then, try to apply the YOLO-v4 model in the field of deep learning to the field of audio recognition and improve it by combining with the spatial pyramid pool module to strengthen the generalization ability of data in different audio formats. Second, the stacking method in ensemble learning is used to fuse the independent submodels of two different channels. Experimental results show that compared with other deep learning technologies, the improved YOLO-v4 model can improve the performance of audio recognition, and it has better performance in processing data of different audio formats, which shows better generalization ability.

show abstract

Sound Classification Based on Multihead Attention and Support Vector Machine

Cited by 8 publications

References 31 publications

Classification of Indian Classical Music (Hindustani Music) Genres through MFCCs Features using RNN-LSTM Model

Classification of Indian Classical Music (Hindustani Music) Genres through MFCCs Features using RNN-LSTM Model

Live Event Detection for People’s Safety Using NLP and Deep Learning

Research on Audio Recognition Based on the Deep Neural Network in Music Teaching

Contact Info

Product

Resources

About