Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach

Fang, Shih Hau; Tsao, Yu; Hsiao, Min Jing; Chen, Ji Ying; Lai, Yeong‐Lin; Lin, Feng; Wang, Chi Te

doi:10.1016/j.jvoice.2018.02.003

Cited by 171 publications

(103 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In addition, a recent study confirms that a deep neural network model can be used to detect voice disorders with high accuracy based on voice samples [40]. Meanwhile, it has been confirmed that the ANN can incorporate heterogeneous data to achieve better classification and regression performance [41, 42].…”

Section: Discussionmentioning

confidence: 99%

Demographic and Symptomatic Features of Voice Disorders and Their Potential Application in Classification Using Machine Learning Algorithms

Tsui¹,

Tsao²,

Lin³

et al. 2018

Folia Phoniatr Logop

Self Cite

View full text Add to dashboard Cite

Background: Studies have used questionnaires of dysphonic symptoms to screen voice disorders. This study investigated whether the differential presentation of demographic and symptomatic features can be applied to computerized classification. Methods: We recruited 100 patients with glottic neoplasm, 508 with phonotraumatic lesions, and 153 with unilateral vocal palsy. Statistical analyses revealed significantly different distributions of demographic and symptomatic variables. Machine learning algorithms, including decision tree, linear discriminant analysis, K-nearest neighbors, support vector machine, and artificial neural network, were applied to classify voice disorders. Results: The results showed that demographic features were more effective for detecting neoplastic and phonotraumatic lesions, whereas symptoms were useful for detecting vocal palsy. When combining demographic and symptomatic variables, the artificial neural network achieved the highest accuracy of 83 ± 1.58%, whereas the accuracy achieved by other algorithms ranged from 74 to 82.6%. Decision tree analyses revealed that sex, age, smoking status, sudden onset of dysphonia, and 10-item voice handicap index scores were significant characteristics for classification. Conclusion: This study demonstrated a significant difference in demographic and symptomatic features between glottic neoplasm, phonotraumatic lesions, and vocal palsy. These features may facilitate automatic classification of voice disorders through machine learning algorithms.

show abstract

Section: Discussionmentioning

confidence: 99%

Demographic and Symptomatic Features of Voice Disorders and Their Potential Application in Classification Using Machine Learning Algorithms

Tsui¹,

Tsao²,

Lin³

et al. 2018

Folia Phoniatr Logop

Self Cite

View full text Add to dashboard Cite

show abstract

“…Objective metrics obtained using various acoustic instruments have been investigated, and attempts have been made to correlate these with perceptual voice quality assessments [8][9][10][11][12].A plethora of temporal, spectral, and cepstral metrics have been proposed to evaluate voice quality [13,14]. Commonly used features or vocal metrics include fundamental frequency ( f 0), loudness, jitter, shimmer, vocal formants, harmonic-to-noise ratio (HNR), spectral tilt (H1-H2, harmonic richness factor), maximum flow declination rate (MFDR), duty ratio, cepstral peak prominence (CPP), Mel-frequency cepstral coefficients (MFCCs), power spectrum ratio, and others [15][16][17][18][19]. Self-reported feelings of decreased vocal functionality have been used as a criterion for vocal fatigue in many previous studies [1,4,[20][21][22].…”

mentioning

confidence: 99%

Investigation of Vocal Fatigue Using a Dose-Based Vocal Loading Task

et al. 2020

View full text Add to dashboard Cite

Vocal loading tasks are often used to investigate the relationship between voice use and vocal fatigue in laboratory settings. The present study investigated the concept of a novel quantitative dose-based vocal loading task for vocal fatigue evaluation. Ten female subjects participated in the study. Voice use was monitored and quantified using an online vocal distance dose calculator during six consecutive 30-min long sessions. Voice quality was evaluated subjectively using the CAPE-V and SAVRa before, between, and after each vocal loading task session. Fatigue-indicative symptoms, such as cough, swallowing, and voice clearance, were recorded. Statistical analysis of the results showed that the overall severity, the roughness, and the strain ratings obtained from CAPE-V obeyed similar trends as the three ratings from the SAVRa. These metrics increased over the first two thirds of the sessions to reach a maximum, and then decreased slightly near the session end. Quantitative metrics obtained from surface neck accelerometer signals were found to obey similar trends. The results consistently showed that an initial adjustment of voice quality was followed by vocal saturation, supporting the effectiveness of the proposed loading task. These tools require specific vocal stimuli. For example, the CAPE-V requires the completion of three defined phonation tasks assessed through perceptual rating. This therefore limits the applicability of these tools in situations where the vocal stimuli are varied or unspecified. Many studies have investigated uncertainties in subjective judgment methodologies for voice quality evaluation. Kreiman and Gerratt investigated the source of listener disagreement in voice quality assessment using unidimensional rating scales, and found that no single metric from natural voice recordings allowed the evaluation of voice quality [6]. Kreiman also found that individual standards of voice quality, scale resolution, and voice attribute magnitude also significantly influenced intra-rater agreement [7]. Objective metrics obtained using various acoustic instruments have been investigated, and attempts have been made to correlate these with perceptual voice quality assessments [8][9][10][11][12].A plethora of temporal, spectral, and cepstral metrics have been proposed to evaluate voice quality [13,14]. Commonly used features or vocal metrics include fundamental frequency ( f 0), loudness, jitter, shimmer, vocal formants, harmonic-to-noise ratio (HNR), spectral tilt (H1-H2, harmonic richness factor), maximum flow declination rate (MFDR), duty ratio, cepstral peak prominence (CPP), Mel-frequency cepstral coefficients (MFCCs), power spectrum ratio, and others [15][16][17][18][19]. Self-reported feelings of decreased vocal functionality have been used as a criterion for vocal fatigue in many previous studies [1,4,[20][21][22]. Standard self-administered questionnaires, such as the SAVRa and the Vocal Fatigue Index (VFI), have been used to identify individuals with vocal fatigue, and to characterize their sy...

show abstract

“…A subset of the corpus previously described in [4] has been supplied by the organizers of the challenge. The provided dataset has been divided into a training and testing partition for the purposes of performance evaluation.…”

Section: A Corpusmentioning

confidence: 99%

ByoVoz Automatic Voice Condition Analysis System for the 2018 FEMH Challenge

Arias-Londoño

García

Moro-Velázquez

et al. 2018

2018 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

This paper presents the methods and results used by the ByoVoz team for the design of an automatic voice condition analysis system, which was submitted to the 2018 Far East Memorial Hospital voice data challenge. The proposed methodology is based on a cascading scheme that firstly discriminates between pathological and normophonic voices, and then identifies the type of disorder. By using diverse feature selection techniques, a subset of complexity, spectral/cepstral and perturbation characteristics were identified for the proposed tasks. Then, several generative classification methodologies based on Gaussian Mixture Models and Gradient Boosting were employed to provide decisions about the input voices in the binary classification, and using onevs-one classification systems based on Random Forests for the categorization according to the type of disorder. By using a 4-folds cross-validation approach on the training partition a sensitivity=0.93 and specificity=0.74 were obtained. Similarly, an unweighted average recall of 0.63 and an accuracy of 66% was obtained for the identification task. Using the scoring metric proposed in the challenge the final resulting score considering both detection and identification is of 0.77.

show abstract

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach

Cited by 171 publications

References 19 publications

Demographic and Symptomatic Features of Voice Disorders and Their Potential Application in Classification Using Machine Learning Algorithms

Demographic and Symptomatic Features of Voice Disorders and Their Potential Application in Classification Using Machine Learning Algorithms

Investigation of Vocal Fatigue Using a Dose-Based Vocal Loading Task

ByoVoz Automatic Voice Condition Analysis System for the 2018 FEMH Challenge

Contact Info

Product

Resources

About