A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients

Khan, A. Nayeemulla; Shahina, A.

doi:10.1007/s10772-021-09888-y

Cited by 9 publications

(3 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2) Chroma is a feature that focusing on music oriented audio tones. 26 This feature can provide a distribution of tonal variations in audio. The Chroma feature's result is a chromagram built based on 12 (twelve) tone levels.…”

Section: Designed Systemmentioning

confidence: 99%

Multi Feature fusion for COPD Classification using Deep learning algorithms

Patel,

Diwan,

Patel

et al. 2024

JIST

View full text Add to dashboard Cite

Machine learning (ML) and deep learning (DL) are becoming pivotal for providing solutions to healthcare issues. Due to their accurate and quick forecasting models and discoveries, ML and DL algorithms are being used for disease classification by healthcare experts. Along with life-threatening illnesses like cancer, respiratory problems such as Chronic Obstructive Pulmonary Disease (COPD) have been growing more prevalent and endangering the survival of human society. According to the World Health Organization, COPD will be the third-leading cause of death and the seventh-leading cause of illness globally by 2030. Therefore, early detection and fast treatment are essential. The primary methods for diagnosing COPD need inadequate and pricy spirometer and imaging equipment. In this paper, an attempt is made to determine the severity of COPD disease using ML and DL algorithms using the cough sound of the patient. To extract audio features like Mfcc, Chroma, Contract, Mel, and Tonnetz, we have used the Librosa Python Library. To address the issues of imbalanced dataset, we have used the SMOTE algorithm. To find the most effective multi feature fusion for classifying COPD, numerous experiments have been carried out using various fusions of audio features. For the purpose of evaluating the multifeature fusion's performance, we have run MLP, CNN, RNN, and LSTM models on fusion of two audio features and three audio features. Results of experiments suggest that the LSTM model with Adam as an optimization function gives 100% training accuracy and 87% testing accuracy for fusion of Mfcc and Mel features. As a result of the fusion of the three features of Tonnetz, Chroma, and Mel, CNN model performs better with training accuracy of 90% and testing accuracy of 82%.

show abstract

Section: Designed Systemmentioning

confidence: 99%

Multi Feature fusion for COPD Classification using Deep learning algorithms

Patel,

Diwan,

Patel

et al. 2024

JIST

View full text Add to dashboard Cite

show abstract

“…2) Chroma is a feature extraction focusing on musicoriented audio tones [21]. This feature can provide a distribution of tonal variations in audio in the form of a simple feature.…”

Section: B Feature Extraction 1) Mel Frequency Cepstral Coefficients ...mentioning

confidence: 99%

Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning

Gondohanindijo¹,

-²,

Noersasongko³

et al. 2023

IJACSA

View full text Add to dashboard Cite

The increasing need for human interaction with computers makes the interaction process more advanced, one of which is by utilizing voice recognition. Developing a voice command system also needs to consider the user's emotional state because the users indirectly treat computers like humans in general. By knowing the type of a person's emotions, the computer can adjust the type of feedback that will be given so that the human-computer interaction (HCI) process will run more humanely. Based on the results of previous research, increasing the accuracy of recognizing the types of human emotions is still a challenge for researchers. This is because not all types of emotions can be expressed equally, especially differences in language and cultural accents. In this study, it is proposed to recognize speech-based emotion types using multifeature extraction and deep learning. The dataset used is taken from the RAVDESS database. The dataset was then extracted using MFCC, Chroma, Mel-Spectrogram, Contrast, and Tonnetz. Furthermore, in this study, PCA (Principal Component Analysis) and Min-Max Normalization techniques will be applied to determine the impact resulting from the application of these techniques. The data obtained from the pre-processing stage is then used by the Deep Neural Network (DNN) model to identify the types of emotions such as calm, happy, sad, angry, neutral, fearful, surprised, and disgusted. The model testing process uses the confusion matrix technique to determine the performance of the proposed method. The test results for the DNN model obtained the accuracy value of 93.61%, a sensitivity of 73.80%, and a specificity of 96.34%. The use of multi-features in the proposed method can improve the performance of the model's accuracy in determining the type of emotion based on the RAVDESS dataset. In addition, using the PCA method also provides an increase in pattern correlation between features so that the classifier model can show performance improvements, especially accuracy, specificity, and sensitivity.

show abstract

“…The way people hear pitch is periodic, meaning that two pitches that are different by one or more octaves are heard as having the same color, or harmonic role (where, in our scale, an octave is defined as the distance of 12 pitches). The main idea behind chroma features is to combine all spectral information about a given pitch class into a single coefficient [34]. One of the most important things about chroma features is that they capture the harmony and melody of music.…”

Section: ) Mfcc(mel-frequency Cepstral Coefficients)mentioning

confidence: 99%

BMNet-5: A Novel Approach of Neural Network to Classify the Genre of Bengali Music Based on Audio Features

et al. 2022

View full text Add to dashboard Cite

Music genre classification (MGC) is the process of putting genre labels on music by analyzing the sounds or words. With the rapid growth of music data repositories, MGC can be used in a lot of ways to organize and manage music recommendation systems, advertising, and streaming services. But there have been a lot of works on classifying English music using different statistical and machine learning methods, but there hasn't been much progress in classifying Bengali music. Also, Deep Learning (DL) methods have been used in a few important ways to classify different types of music. The content and uniqueness of Bengali music make it much more interesting. Also, there is still a lot to learn about how to use the DL approach in Bengali music. So, Bengali music genre classification is a pretty new area of research in the field of Deep Learning. In this paper, we developed a unique technique called BMNet-5 to perform a multiclass classification of Bangla music genres such as "Bangla Adhunik," "Bangla Hip-Hop," "Bangla Band Music," "Nazrulgeeti," "Palligeeti," and "Rabindra Sangeet." We show the effectiveness of the suggested technique by extracting features from a dataset of 1742 Bangla music pieces and evaluating the automated classification judgments. The proposed BMNet-5 is based on a neural network designed to predict music genre from audio inputs. Our suggested model outperformed the corresponding previous research with an accuracy of 90.32%. The BMNet-5 model is then tested for performance consistency using K-fold cross validation with varying k values. Finally, we use the suggested model to train the interpretable SHAP model for all the genre of the Bangla music dataset, and the development of an explainable outcome may have a significant advantage.

show abstract

A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients

Cited by 9 publications

References 35 publications

Multi Feature fusion for COPD Classification using Deep learning algorithms

Multi Feature fusion for COPD Classification using Deep learning algorithms

Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning

BMNet-5: A Novel Approach of Neural Network to Classify the Genre of Bengali Music Based on Audio Features

Contact Info

Product

Resources

About