A model-based inverse filtering scheme is proposed for an accurate, non-invasive estimation of the aerodynamic source of voiced sounds at the glottis. The approach, referred to as subglottal impedance-based inverse filtering (IBIF), takes as input the signal from a lightweight accelerometer placed on the skin over the extrathoracic trachea and yields estimates of glottal airflow and its time derivative, offering important advantages over traditional methods that deal with the supraglottal vocal tract. The proposed scheme is based on mechano-acoustic impedance representations from a physiologically-based transmission line model and a lumped skin surface representation. A subject-specific calibration protocol is used to account for individual adjustments of subglottal impedance parameters and mechanical properties of the skin. Preliminary results for sustained vowels with various voice qualities show that the subglottal IBIF scheme yields comparable estimates with respect to current aerodynamics-based methods of clinical vocal assessment. A mean absolute error of less than 10% was observed for two glottal airflow measures –maximum flow declination rate and amplitude of the modulation component– that have been associated with the pathophysiology of some common voice disorders caused by faulty and/or abusive patterns of vocal behavior (i.e., vocal hyperfunction). The proposed method further advances the ambulatory assessment of vocal function based on the neck acceleration signal, that previously have been limited to the estimation of phonation duration, loudness, and pitch. Subglottal IBIF is also suitable for other ambulatory applications in speech communication, in which further evaluation is underway.
Different source-related factors can lead to vocal fold instabilities and bifurcations referred to as voice breaks. Nonlinear coupling in phonation suggests that changes in acoustic loading can also be responsible for this unstable behavior. However, no in vivo visualization of tissue motion during these acoustically induced instabilities has been reported. Simultaneous recordings of laryngeal high-speed videoendoscopy, acoustics, aerodynamics, electroglottography, and neck skin acceleration are obtained from a participant consistently exhibiting voice breaks during pitch glide maneuvers. Results suggest that acoustically induced and source-induced instabilities can be distinguished at the tissue level. Differences in vibratory patterns are described through kymography and phonovibrography; measures of glottal area, open/speed quotient, and amplitude/phase asymmetry; and empirical orthogonal function decomposition. Acoustically induced tissue instabilities appear abruptly and exhibit irregular vocal fold motion after the bifurcation point, whereas source-induced ones show a smoother transition. These observations are also reflected in the acoustic and acceleration signals. Added aperiodicity is observed after the acoustically induced break, and harmonic changes appear prior to the bifurcation for the source-induced break. Both types of breaks appear to be subcritical bifurcations due to the presence of hysteresis and amplitude changes after the frequency jumps. These results are consistent with previous studies and the nonlinear source-filter coupling
Measurements of body sounds on the skin surface have been widely used in the medical field and continue to be a topic of current research, ranging from the diagnosis of respiratory and cardiovascular diseases to the monitoring of voice dosimetry. These measurements are typically made using light-weight accelerometers and/or air-coupled microphones attached to the skin. Although normally neglected, air-borne sounds generated by the subject or other sources of background noise can easily corrupt such recordings, which is particularly critical in the recording of voiced sounds on the skin surface. In this study, the sensitivity of commonly used bioacoustic sensors to air-borne sounds was evaluated and compared with their sensitivity to tissue-borne body sounds. To delineate the sensitivity to each pathway, the sensors were first tested in vitro and then on human subjects. The results indicated that, in general, the air-borne sensitivity is sufficiently high to significantly corrupt body sound signals. In addition, the air-borne and tissue-borne sensitivities can be used to discriminate between these components. Although the study is focused on the evaluation of voiced sounds on the skin surface, an extension of the proposed methods to other bioacoustic applications is discussed.
A time-domain model of sound wave propagation in the branching airways of the subglottal system is presented. The model is formulated as an extension to an acoustic transmission-line modeling scheme originally developed for simulating the supraglottal system in the time-domain during speech production [Maeda (1982). Speech Commun. 1, 199-229; Mokhtari et al. (2008). Speech Commun. 50, 179-190]. The approach allows for predictions of time-varying acoustic pressure and volume velocity at any point along the various generations of subglottal airways from trachea to alveoli. In addition, the model can be configured so that its overall structure simulates different geometric forms, including airways that branch in a symmetric or asymmetric pattern. Three subglottal configurations, two symmetric and one asymmetric, were represented based on reported anatomical dimensions of the subglottal airways. Estimates of the acoustic input impedances of these subglottal configurations revealed resonant characteristics similar to those found in the previous studies. Simulations of voiced sound propagation into the subglottal airways, achieved by coupling the subglottal model to a two-mass vocal fold model and a supraglottal tract configured for different vowels, yielded predictions of time-domain sound pressure waveforms below the vocal folds that compare favorably to previous measurements in human subjects.
Inverse filtering of oral airflow using closed-phase linear prediction is expected to preserve the effects of source-filter interactions in the glottal airflow pulse. Under incomplete glottal closure, the glottal airflow estimation is more challenging due to a lowered glottal impedance, increased subglottal coupling, and violated all-pole assumption. To account for these effects, a model-based inverse filtering scheme allowing for coupling between glottis and upper and lower airways was developed. Acoustic transmission in the tracts used a frequency-domain transmission line. A linearized, time-varying expression was used for the glottal impedance, along with a dipole representation. Synthetic vowels sounds and actual recordings were used to evaluate the proposed scheme. Subject-specific model parameters were obtained from simultaneous aerodynamic, acoustic, and high-speed videoendoscopic recordings of normal subjects uttering vowels with various degrees of glottal closure. Results illustrated that, even under incomplete glottal closure, the airflow entering the vocal tract preserved source-filter interactions and was comparable to that obtained using closed-phase linear prediction. The scheme also yielded an uncoupled glottal airflow that exhibited a clear pulse de-skewing, making it proportional to the glottal area. Cases with larger glottal gaps exhibited lower mean impedances and less pulse skewing, with airflow estimates proportional to the transglottal pressure drop.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.