Modeling Voice Pathology Detection Using Imbalanced Learning

Fan, Ziqi; Qian, Jinyang; Sun, Baoyin; Wu, Di; Xu, Yishen; Tao, Zhi

doi:10.1109/icsmd50554.2020.9261679

Cited by 6 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In computing vocal tract features, previous investigations have taken advantage of methods such as linear predictive cepstral coefficients (LPCCs) [12], perceptual linear prediction (PLP) [13] and mel-frequency cepstral coefficients (MFCCs) [14]. Regarding the classifier stage, several studies have explored conventional ML classifiers such as support vector machine (SVM) [4,15,16,17], random forest (RF) [18] and decision trees [16,19]. Due to recent advancements in deep learning, classical ML methods have been increasingly replaced by DL networks such as multilayer perceptron (MLP) [20], deep neural networks (DNNs) [21,22], long short-term memory (LSTM) networks [23,24], convolutional neural networks (CNNs) [25], combinations of CNN and MLP [4], and combinations of CNN and LSTM [26].…”

Section: Introductionmentioning

confidence: 99%

Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers

Javanmardi¹,

Kadiri²,

Kodali³

et al. 2022

Interspeech 2022

View full text Add to dashboard Cite

The present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long shortterm memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared to using the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%.

show abstract

Section: Introductionmentioning

confidence: 99%

Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers

Javanmardi¹,

Kadiri²,

Kodali³

et al. 2022

Interspeech 2022

View full text Add to dashboard Cite

show abstract

“…In practical applications, the class-imbalanced data result from the insufficient number of samples in the pathological voice database, which also makes it difficult for the traditional VPD system to classify multiple pathological types. Given its importance, pathological voice diagnoses with imbalanced data have attracted the interest of researchers [10,11].…”

Section: Introductionmentioning

confidence: 99%

Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

Fan

Zhou

et al. 2021

Applied Sciences

Self Cite

View full text Add to dashboard Cite

The Massachusetts Eye and Ear Infirmary (MEEI) database is an international-standard training database for voice pathology detection (VPD) systems. However, there is a class-imbalanced distribution in normal and pathological voice samples and different types of pathological voice samples in the MEEI database. This study aimed to develop a VPD system that uses the fuzzy clustering synthetic minority oversampling technique algorithm (FC-SMOTE) to automatically detect and classify four types of pathological voices in a multi-class imbalanced database. The proposed FC-SMOTE algorithm processes the initial class-imbalanced dataset. A set of machine learning models was evaluated and validated using the resulting class-balanced dataset as an input. The effectiveness of the VPD system with FC-SMOTE was further verified by an external validation set and another pathological voice database (Saarbruecken Voice Database (SVD)). The experimental results show that, in the multi-classification of pathological voice for the class-imbalanced dataset, the method we propose can significantly improve the diagnostic accuracy. Meanwhile, FC-SMOTE outperforms the traditional imbalanced data oversampling algorithms, and it is preferred for imbalanced voice diagnosis in practical applications.

show abstract

Identification of Voice Disorders: A Comparative Study of Machine Learning Algorithms

Coelho,

Shashirekha

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Modeling Voice Pathology Detection Using Imbalanced Learning

Cited by 6 publications

References 15 publications

Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers

Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers

Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

Identification of Voice Disorders: A Comparative Study of Machine Learning Algorithms

Contact Info

Product

Resources

About