B. Cranen scite author profile

An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable (missing), and to replace (impute) the missing ones by clean speech estimates. Conventional im putation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low SNR's these techniques fail, because too many time frames may contain few, if any, reliable features. In this paper we introduce a novel non-parametric, exemplarbased method for reconstructing clean speech from noisy ob servations, based on techniques from the field of Compressive Sensing. The method, dubbed sparse imputation, can impute missing features using larger time windows such as entire words. Using an overcomplete dictionary of clean speech exemplars, the method finds the sparsest combination of exemplars that jointly approximate the reliable features of a noisy utterance. That linear combination of clean speech exemplars is used to replace the missing features. Recognition experiments on noisy isolated digits show that sparse imputation outperforms conventional imputation tech niques at SNR =-5 dB when using an ideal 'oracle' mask. With error-prone estimated masks sparse imputation performs slightly worse than the best conventional technique.

show abstract

Pressure measurements during speech production using semiconductor miniature pressure transducers: Impact on models for speech production

Cranen

Boves

1985

View full text Add to dashboard Cite

It appears that temperature instabilities are a major obstacle hindering the use of semiconductor strain gauge pressure transducers in speech research, especially when absolute pressure data are mandatory. In this paper a simple and reliable method for an in vivo calibration of this kind of transducer is described. The most important error source, the drift of the zero pressure level due to temperature changes, is discussed, and an estimation of the measurement accuracy which can be obtained is given. Moreover, some registrations of subglottal, supraglottal, and transglottal pressure are presented. It is shown that the pressure recordings allow us to obtain estimates of the volume flow in the trachea and pharynx. Analysis of those waveforms appears to lead to new insights into the physical processes underlying voice production. Specifically, an independent glottal contribution to the skewing of the glottal flow pulses is identified.

show abstract

On subglottal formant analysis

Cranen

Boves

1987

View full text Add to dashboard Cite

When subglottal pressure signals which are recorded during normal speech production are spectrally analyzed, the frequency of the first spectral maximum appears to deviate appreciably from the first resonance frequency which has been reported in the literature and which stems from measurements of the acoustic impedance of the subglottal system. It is postulated that this is caused by the spectrum of the excitation function. This hypothesis is corroborated by a modeling study. Using an extended version of the well-known two-mass model of the vocal folds that can account for a glottal leak, it is shown that under realistic physiological assumptions glottal flow waveforms are generated whose spectral properties cause a downward shift of the location of the first spectral maximum in the subglottal pressure signals. The order of magnitude of this effect is investigated for different glottal settings and with a subglottal system that is modeled according to the impedance measurements reported in the literature. The outcomes of this modeling study show that the location of the first spectral maximum of the subglottal pressure may deviate appreciably from the natural frequency of the subglottal system. As a consequence, however, the comfortable assumption that in normal speech the glottal excitation function is constant and zero during the "closed glottis interval" has to be called into question.

show abstract

Choosing alternatives: Using Bayesian Networks and memory-based learning to study the dative alternation

Theijssen¹,

Bosch²,

Cranen³

et al. 2013

View full text Add to dashboard Cite

In existing research on syntactic alternations such as the dative alternation, (give her the apple vs. give the apple to her), the linguistic data is often analysed with the help of logistic regression models. In this article, we evaluate the use of logistic regression for this type of research, and present two different approaches: Bayesian Networks and Memory-based learning. For the Bayesian Network, we use the higher-level semantic features suggested in the literature, while we limit ourselves to lexical items in the memory-based approach. We evaluate the suitability of the three approaches by applying them to a large data set (>11,000 instances) extracted from the British National Corpus, and comparing their quality in terms of classification accuracy, their interpretability in the context of linguistic research, and their actual classification of individual cases. Our main finding is that the classifications are very similar across the three approaches, also when employing lexical items instead of the higher-level features, because most of the alternation is determined by the verb and the length of the two objects (here: her and the apple).

show abstract

Acoustic backing-off as an implementation of missing feature theory

Veth¹,

Cranen²,

Boves³

2001

Speech Communication

View full text Add to dashboard Cite

Acoustic backing-off was recently proposed as an operationalisa tion of missing feature theory for increased recognition robustness. Acoustic backing-off effectively removes the detrimental influence o f outlier values from the local decisions in the Viterbi algorithm without any kind of explicit outlier detection. In the context of con nected digit recognition over telephone lines, it is shown that with more than 30% of the static mel-frequency cepstral coefficients dis turbed, acoustic backing-off is capable of reducing the word er ror rate by one order of magnitude. Furthermore, our results indi cate that the effectiveness of acoustic backing-off is optimal when dispersion of distortions due to acoustic feature transformations is minimal.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

B. Cranen

Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition

Pressure measurements during speech production using semiconductor miniature pressure transducers: Impact on models for speech production

On subglottal formant analysis

Choosing alternatives: Using Bayesian Networks and memory-based learning to study the dative alternation

Acoustic backing-off as an implementation of missing feature theory

Contact Info

Product

Resources

About