L. Welling scite author profile

This paper presents methods for speaker adaptive modeling using vocal tract normalization (VTN) along with experimental tests on three databases. We propose a new training method for VTN: By using single-density acoustic models per HMM state for selecting the scale factor of the frequency axis, we avoid the problem that a mixture-density tends to learn the scale factors of the training speakers and thus cannot be used for selecting the scale factor. We show that using single Gaussian densities for selecting the scale factor in training results in lower error rates than using mixture densities. For the recognition phase, we propose an improvement of the well-known two-pass strategy: By using a nonnormalized acoustic model for the first recognition pass instead of a normalized model, lower error rates are obtained. In recognition tests, this method is compared with a fast variant of VTN. The two-pass strategy is an efficient method, but it is suboptimal because the scale factor and the word sequence are determined sequentially. We found that for telephone digit string recognition this suboptimality reduces the VTN gain in recognition performance by 30% relative. In summary, on the German spontaneous speech task Verbmobil, the WSJ task and the German telephone digit string corpus SieTill, the proposed methods for VTN reduce the error rates significantly.Index Terms-Speaker adaptive modeling and training, speaker adaptive recognition, speech recognition, vocal tract (length) normalization.

show abstract

Improved methods for vocal tract normalization

Welling¹,

Kanthak²,

Ney³

1999

View full text Add to dashboard Cite

Formant estimation for speech recognition

Welling

Ney

1998

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

Abstract-This paper presents a new method for estimating formant frequencies. The formant model is based on a digital resonator. Each resonator represents a segment of the short-time power spectrum. The complete spectrum is modeled by a set of digital resonators connected in parallel. An algorithm based on dynamic programming produces both the model parameters and the segment boundaries that optimally match the spectrum.We used this method in experimental tests that were carried out on the TI digit string data base. The main results of the experimental tests are: 1) the presented approach produces reliable estimates of formant frequencies across a wide range of sounds and speakers; and 2) the estimated formant frequencies were used in a number of variants for recognition. The best set-up resulted in a string error rate of 4.2% on the adult corpus of the TI digit string data base.

show abstract

The RWTH large vocabulary continuous speech recognition system

Ney

Welling²,

Ortmanns³

et al.

View full text Add to dashboard Cite

A model for efficient formant estimation

Welling

Ney²

View full text Add to dashboard Cite

This paper presents a new method for estimating formant frequencies. The formant model is based on a digital resonator. Each resonator represents a segment of the short-time power spectrum. The complete spectrum is modeled by a set of digital resonators connected in parallel. An algorithm based on dynamic programming produces both the model parameters and segment boundaries that optimally match the spectrum.The main results of this paper are: 1) Modeling formants by digital resonators allows a reliable estimation of formant frequencies. 2) Digital resonators can be used efficiently in connection with dynamic programming. 3) A recognition test with formant frequencies results in a string error rate of 4.8% on the adult corpus of the TI digit string database.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

L. Welling

Speaker adaptive modeling by vocal tract normalization

Improved methods for vocal tract normalization

Formant estimation for speech recognition

The RWTH large vocabulary continuous speech recognition system

A model for efficient formant estimation

Contact Info

Product

Resources

About