An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable (missing), and to replace (impute) the missing ones by clean speech estimates. Conventional im putation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low SNR's these techniques fail, because too many time frames may contain few, if any, reliable features. In this paper we introduce a novel non-parametric, exemplarbased method for reconstructing clean speech from noisy ob servations, based on techniques from the field of Compressive Sensing. The method, dubbed sparse imputation, can impute missing features using larger time windows such as entire words. Using an overcomplete dictionary of clean speech exemplars, the method finds the sparsest combination of exemplars that jointly approximate the reliable features of a noisy utterance. That linear combination of clean speech exemplars is used to replace the missing features. Recognition experiments on noisy isolated digits show that sparse imputation outperforms conventional imputation tech niques at SNR =-5 dB when using an ideal 'oracle' mask. With error-prone estimated masks sparse imputation performs slightly worse than the best conventional technique.
It appears that temperature instabilities are a major obstacle hindering the use of semiconductor strain gauge pressure transducers in speech research, especially when absolute pressure data are mandatory. In this paper a simple and reliable method for an in vivo calibration of this kind of transducer is described. The most important error source, the drift of the zero pressure level due to temperature changes, is discussed, and an estimation of the measurement accuracy which can be obtained is given. Moreover, some registrations of subglottal, supraglottal, and transglottal pressure are presented. It is shown that the pressure recordings allow us to obtain estimates of the volume flow in the trachea and pharynx. Analysis of those waveforms appears to lead to new insights into the physical processes underlying voice production. Specifically, an independent glottal contribution to the skewing of the glottal flow pulses is identified.
When subglottal pressure signals which are recorded during normal speech production are spectrally analyzed, the frequency of the first spectral maximum appears to deviate appreciably from the first resonance frequency which has been reported in the literature and which stems from measurements of the acoustic impedance of the subglottal system. It is postulated that this is caused by the spectrum of the excitation function. This hypothesis is corroborated by a modeling study. Using an extended version of the well-known two-mass model of the vocal folds that can account for a glottal leak, it is shown that under realistic physiological assumptions glottal flow waveforms are generated whose spectral properties cause a downward shift of the location of the first spectral maximum in the subglottal pressure signals. The order of magnitude of this effect is investigated for different glottal settings and with a subglottal system that is modeled according to the impedance measurements reported in the literature. The outcomes of this modeling study show that the location of the first spectral maximum of the subglottal pressure may deviate appreciably from the natural frequency of the subglottal system. As a consequence, however, the comfortable assumption that in normal speech the glottal excitation function is constant and zero during the "closed glottis interval" has to be called into question.
In existing research on syntactic alternations such as the dative alternation, (give her the apple vs. give the apple to her), the linguistic data is often analysed with the help of logistic regression models. In this article, we evaluate the use of logistic regression for this type of research, and present two different approaches: Bayesian Networks and Memory-based learning. For the Bayesian Network, we use the higher-level semantic features suggested in the literature, while we limit ourselves to lexical items in the memory-based approach. We evaluate the suitability of the three approaches by applying them to a large data set (>11,000 instances) extracted from the British National Corpus, and comparing their quality in terms of classification accuracy, their interpretability in the context of linguistic research, and their actual classification of individual cases. Our main finding is that the classifications are very similar across the three approaches, also when employing lexical items instead of the higher-level features, because most of the alternation is determined by the verb and the length of the two objects (here: her and the apple).
Acoustic backing-off was recently proposed as an operationalisa tion of missing feature theory for increased recognition robustness. Acoustic backing-off effectively removes the detrimental influence o f outlier values from the local decisions in the Viterbi algorithm without any kind of explicit outlier detection. In the context of con nected digit recognition over telephone lines, it is shown that with more than 30% of the static mel-frequency cepstral coefficients dis turbed, acoustic backing-off is capable of reducing the word er ror rate by one order of magnitude. Furthermore, our results indi cate that the effectiveness of acoustic backing-off is optimal when dispersion of distortions due to acoustic feature transformations is minimal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.