Murat Saraçlar scite author profile

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However, using the feature set output from the perceptron algorithm (initialized with their weights), CRF training provides an additional 0.5% reduction in word error rate, for a total 1.8% absolute reduction from the baseline of 39.2%.

show abstract

Stochastic pronunciation modelling from hand-labelled phonetic corpora

Riley

Byrne

Finke

et al. 1999

Speech Communication

View full text Add to dashboard Cite

Morph-based speech recognition and modeling of out-of-vocabulary words across languages

Creutz¹,

Hirsimäki²,

Kurimo³

et al. 2007

ACM Trans. Speech Lang. Process.

View full text Add to dashboard Cite

We explore the use of morph-based language models in large-vocabulary continuous-speech recognition systems across four so-called morphologically rich languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered in an unsupervised, data-driven way using the Morfessor algorithm. By estimating n-gram language models over sequences of morphs instead of words, the quality of the language model is improved through better vocabulary coverage and reduced data sparsity. Standard word models suffer from high out-ofvocabulary (OOV) rates, whereas the morph models can recognize previously unseen word forms by concatenating morphs. It is shown that the morph models do perform fairly well on OOVs without compromising the recognition accuracy on in-vocabulary words. The Arabic experiment constitutes the only exception since here the standard word model outperforms the morph model. Differences in the datasets and the amount of data are discussed as a plausible explanation.

show abstract

Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus

2008

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Murat Saraçlar

Discriminative n-gram language modeling

Discriminative language modeling with conditional random fields and the perceptron algorithm

Stochastic pronunciation modelling from hand-labelled phonetic corpora

Morph-based speech recognition and modeling of out-of-vocabulary words across languages

Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus

Contact Info

Product

Resources

About