Dysphonia is a common complaint, almost every fourth child produces a pathological voice. A mobile based filtering system, that can be used by pre-school workers in order to recognize dysphonic voiced children in order to get professional help as soon as possible, would be desired. The goal of this research is to identify acoustic parameters that are able to distinguish healthy voices of children from those with dysphonia voices of children. In addition, the possibility of automatic classification is children. In addition, the possibility of automatic classification is examined. Two sample T-tests were used for statistical significance testing for the mean values of the acoustic parameters between healthy voices and those with dysphonia. A two-class classification was performed between the two groups using leave-one-out cross validation, with support vector machine (SVM) classifier. Formant frequencies, mel-frequency cepstral coefficients (MFCCs), Harmonics-to-Noise Ratio (HNR), Soft Phonation Index (SPI) and frequency band energy ratios, based on intrinsic mode functions measured on different variations of phonemes showed statistical difference between the groups. A high classification accuracy of 93% was achieved by SVM with linear and rbf kernel using only 8 acoustic parameters. Additional data is needed to build a more general model, but this research can be a reference point in the classification of voices using continuous speech between healthy children and children with dysphonia.
Perceptual evaluation of the patient's voice is the most commonly used method in everyday clinical practice. We propose an automatic approach for the prediction of severity of some types of organic and functional dysphonia. By means of an unsupervised learning method, we have demonstrated that acoustic parameters measured on different phonetic classes are suitable for modelling the four grade assessments of the specialists (RBH subjective scale from 0 to 3). In this study, the overall hoarseness H was examined. Four specialists were asked to determine the severity of dysphonia. A k-means cluster analysis was performed for the decision of each specialist separately; the average accuracy of the four-grade classification was 0.46. The four-grade classification has been surprisingly close to the subjective judgements. Moreover, automatic estimation of severity of dysphonia was also determined. Linear regression and RBF kernel regression models were compared. The average rating of the four specialists were used as target in the experiments. Low RMSE and high correlation measures were obtained between the automatically predicted severity and perceptual assessments. The best RMS value of H was 0.45 for the model with RBF kernel, however, a simpler linear model provided the highest correlation value of 0.85, using only eight acoustic parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.