Detecting Mild Phonotrauma in Daily Life

Purpose: Many studies using machine learning (ML) in speech, language, and hearing sciences rely upon cross-validations with single data splitting. This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust data splitting method of nested k -fold cross-validation. The second purpose is to present methods and MATLAB code to perform power analysis for ML-based analysis during the design of a study. Method: First, the significant impact of different cross-validations on ML outcomes was demonstrated using real-world clinical data. Then, Monte Carlo simulations were used to quantify the interactions among the employed cross-validation method, the discriminative power of features, the dimensionality of the feature space, the dimensionality of the model, and the sample size. Four different cross-validation methods (single holdout, 10-fold, train–validation–test, and nested 10-fold) were compared based on the statistical power and confidence of the resulting ML models. Distributions of the null and alternative hypotheses were used to determine the minimum required sample size for obtaining a statistically significant outcome (5% significance) with 80% power. Statistical confidence of the model was defined as the probability of correct features being selected for inclusion in the final model. Results: ML models generated based on the single holdout method had very low statistical power and confidence, leading to overestimation of classification accuracy. Conversely, the nested 10-fold cross-validation method resulted in the highest statistical confidence and power while also providing an unbiased estimate of accuracy. The required sample size using the single holdout method could be 50% higher than what would be needed if nested k -fold cross-validation were used. Statistical confidence in the model based on nested k -fold cross-validation was as much as four times higher than the confidence obtained with the single holdout–based model. A computational model, MATLAB code, and lookup tables are provided to assist researchers with estimating the minimum sample size needed during study design. Conclusion: The adoption of nested k -fold cross-validation is critical for unbiased and robust ML studies in the speech, language, and hearing sciences. Supplemental Material: https://doi.org/10.23641/asha.25237045

Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting

Ghasemzadeh,

Hillman,

Mehta

2024

Reducing Vocal Fatigue With Bone Conduction Devices: Comparing Forbrain and Sidetone Amplification

Nudelman,

Udd,

Åhlander

et al. 2023

Purpose: Altered auditory feedback research aims to identify methods to strengthen speakers' awareness of their own voicing behaviors, diminish their perception of vocal fatigue, and improve their voice production. This study aims to compare the effects of two bone conduction devices that provide altered auditory feedback. Method: Twenty participants (19–33 years old, age: M [ SD ] = 25.5 [3.85] years) participated in a vocal loading task using a standard Forbrain device that provides filtered auditory feedback via bone conduction and a modified Forbrain device that provides only sidetone amplification, and a control condition with no device was also included. They rated their vocal fatigue on a visual analog scale every 2 min during the vocal loading task. Additionally, pre- and postloading voice samples were analyzed for acoustic voice parameters. Results: Across all participants, the use of bone conduction–altered auditory feedback devices resulted in a lower vocal fatigue when compared to the condition with no feedback. During the pre- and postvoice samples, the sound pressure level decreased significantly during feedback conditions. During feedback conditions, spectral mean and standard deviation significantly decreased, and spectral skew significantly increased. Conclusion: The results promote bone conduction as a possible preventative tool that may reduce self-reported vocal fatigue and compensatory voice production for healthy individuals without voice disorders.

Consistency of the Signature of Phonotraumatic Vocal Hyperfunction Across Different Ambulatory Voice Measures

Ghasemzadeh,

Hillman,

Mehta

2024

Purpose: Although different factors and voice measures have been associated with phonotraumatic vocal hyperfunction (PVH), it is unclear what percentage of individuals with PVH exhibit such differences during their daily lives. This study used a machine learning approach to quantify the consistency with which PVH manifests according to ambulatory voice measures. Analyses included acoustic parameters of phonation as well as temporal aspects of phonation and rest, with the goal of determining optimally consistent signatures of PVH. Method: Ambulatory neck-surface acceleration signals were recorded over 1 week from 116 female participants diagnosed with PVH and age-, sex-, and occupation-matched vocally healthy controls. The consistency of the manifestation of PVH was defined as the percentage of participants in each group that exhibited an atypical signature based on a target voice measure. Evaluation of each machine learning model used nested 10-fold cross-validation to improve the generalizability of findings. In Experiment 1, we trained separate logistic regression models based on the distributional characteristics of 14 voice measures and durations of voicing and resting segments. In Experiments 2 and 3, features of voicing and resting duration augmented the existing distributional characteristics to examine whether more consistent signatures would result. Results: Experiment 1 showed that the difference in the magnitude of the first two harmonics (H1–H2) exhibited the most consistent signature (69.4% of participants with PVH and 20.4% of controls had an atypical H1–H2 signature), followed by spectral tilt over eight harmonics (73.6% participants with PVH and 32.1% of controls had an atypical spectral tilt signature) and estimated sound pressure level (SPL; 66.9% participants with PVH and 27.6% of controls had an atypical SPL signature). Additionally, 77.6% of participants with PVH had atypical resting duration, with 68.9% exhibiting atypical voicing duration. Experiments 2 and 3 showed that augmenting the best-performing voice measures with univariate features of voicing or resting durations yielded only incremental improvement in the classifier's performance. Conclusions: Females with PVH were more likely to use more abrupt vocal fold closure (lower H1–H2), phonate louder (higher SPL), and take shorter vocal rests. They were also less likely to use higher fundamental frequency during their daily activities. The difference in the voicing duration signature between participants with PVH and controls had a large effect size, providing strong empirical evidence regarding the role of voice use in the development of PVH.