A Survey on Signal Processing Based Pathological Voice Detection Techniques

Islam, Rumana; Tarique, Mohammed; Abdel-Raheem, Esam

doi:10.1109/access.2020.2985280

Cited by 66 publications

(21 citation statements)

References 90 publications

(83 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The optimal classifier increases the recall to 0.83, the specificity to 0.95, the G value to 0.88, and the F1 value to 0.86. As an ensemble learning model, RF performs better than single classifiers in pathological voice classification, which is also reflected in the latest review paper [6]. Meanwhile, the same effect is shown in two other typical ensemble learning models (GBDT, XGBoost).…”

Section: Experimental Results and Analysismentioning

confidence: 63%

“…In biomedical engineering, different features are extracted from signals to build VPD systems that automatically detect pathological voices. Most of these studies have experimented with the Massachusetts Eye and Ear Infirmary (MEEI) database [5], which has become one of the standard databases for VPD systems [6]. Nevertheless, in the past studies on voice pathology detection, many researchers ignored the class-imbalanced distribution of voice samples in the MEEI database.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

Fan

Zhou

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

The Massachusetts Eye and Ear Infirmary (MEEI) database is an international-standard training database for voice pathology detection (VPD) systems. However, there is a class-imbalanced distribution in normal and pathological voice samples and different types of pathological voice samples in the MEEI database. This study aimed to develop a VPD system that uses the fuzzy clustering synthetic minority oversampling technique algorithm (FC-SMOTE) to automatically detect and classify four types of pathological voices in a multi-class imbalanced database. The proposed FC-SMOTE algorithm processes the initial class-imbalanced dataset. A set of machine learning models was evaluated and validated using the resulting class-balanced dataset as an input. The effectiveness of the VPD system with FC-SMOTE was further verified by an external validation set and another pathological voice database (Saarbruecken Voice Database (SVD)). The experimental results show that, in the multi-classification of pathological voice for the class-imbalanced dataset, the method we propose can significantly improve the diagnostic accuracy. Meanwhile, FC-SMOTE outperforms the traditional imbalanced data oversampling algorithms, and it is preferred for imbalanced voice diagnosis in practical applications.

show abstract

Section: Experimental Results and Analysismentioning

confidence: 63%

Section: Introductionmentioning

confidence: 99%

Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

Fan

Zhou

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…For each frame, the mel-spectrogram was calculated using 64 mel-frequencybands, an FFT window length of 1024, a hop length of 64, an upper frequency bound of 16384 Hz and the HTK-formula (23) for conversion from Hertz to mel. The advantage of mel-spectrograms is that the center frequency and bandwidth of the chosen triangular filters roughly match the auditory critical band filters (24). Using the Python package librosa (25), each 500 ms frame resulted in a mel-spectrogram with 64 frequency points and 345 time frames.…”

Section: Methodsmentioning

confidence: 99%

Efficient and Explainable Deep Neural Networks for Airway Symptom Detection in Support of Wearable Health Technology

Groh

Lei

Martignetti

et al. 2021

Preprint

View full text Add to dashboard Cite

Mobile health wearables are often embedded with small processors for signal acquisition and analysis. These embedded wearable systems are, however, limited with low available memory and computational power. Advances in machine learning, especially deep neural networks (DNNs), have been adopted for efficient and intelligent applications to overcome constrained computational environments. In this study, evolutionary optimized DNNs were analyzed to classify three common airway-related symptoms, namely coughs, throat clears and dry swallows. As opposed to typical microphone-acoustic signals, mechano-acoustic data signals, which did not contain identifiable speech information for better privacy protection, were acquired from laboratory-generated and publicly available datasets. The optimized DNNs had a low footprint of less than 150 kB and predicted airway symptoms of interests with 83.7% accuracy on unseen data. By performing explainable AI techniques, namely occlusion experiments and class activation maps, mel-frequency bands up to 8,000 Hz were found as the most important feature for the classification. We further found that DNN decisions were consistently relying on these specific features, fostering trust and transparency of proposed DNNs. Our proposed efficient and explainable DNN is expected to support edge computing on mechano-acoustic sensing wearables for remote, long-term monitoring of airway symptoms.

show abstract

“…The advantage of mel-spectrograms is that the center frequency and bandwidth of the chosen triangular filters roughly match the auditory critical band filters. [28] Using the Python package librosa, [29] each 500 ms frame resulted in a mel-spectrogram with 64 frequency points and 345 time frames.…”

Section: Data Preprocessingmentioning

confidence: 99%

Efficient and Explainable Deep Neural Networks for Airway Symptom Detection in Support of Wearable Health Technology

Groh

Lei

Martignetti

et al. 2022

Advanced Intelligent Systems

View full text Add to dashboard Cite

Mobile health wearables are often embedded with small processors for signal acquisition and analysis. These embedded wearable systems are, however, limited with low available memory and computational power. Advances in machine learning, especially deep neural networks (DNNs), have been adopted for efficient and intelligent applications to overcome constrained computational environments. Herein, evolutionary algorithms are used to find novel DNNs that are accurate in classifying airway symptoms while allowing wearable deployment. As opposed to typical microphone‐acoustic signals, mechano‐acoustic data signals, which did not contain identifiable speech information for better privacy protection, are acquired from laboratory‐generated and publicly available datasets. The optimized DNNs had a low model file size of less than 150 kB and predicted airway symptoms of interest with 81.49% accuracy on unseen data. By performing explainable AI techniques, namely occlusion experiments and class activation maps, mel‐frequency bands up to 8,000 Hz are found as the most important feature for the classification. It is further found that DNN decisions are consistently relying on these specific features, fostering trust and transparency of the proposed DNNs. The proposed efficient and explainable DNN is expected to support edge computing on mechano‐acoustic sensing wearables for remote, long‐term monitoring of airway symptoms.

show abstract

A Survey on Signal Processing Based Pathological Voice Detection Techniques

Cited by 66 publications

References 90 publications

Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

Efficient and Explainable Deep Neural Networks for Airway Symptom Detection in Support of Wearable Health Technology

Efficient and Explainable Deep Neural Networks for Airway Symptom Detection in Support of Wearable Health Technology

Contact Info

Product

Resources

About