We propose a method of incorporating a non-probabilistic grammar into large vocabulary continuous speech recognition (LVCSR). Our basic assumption is that the utterances to be recognized are grammatical to a sufficient degree, which enables us to decrease the word error rate by favouring grammatical phrases. We use a parser and a handcrafted grammar to identify grammatical phrases in word lattices produced by a speech recognizer. This information is then used to rescore the word lattice. We measured the benefit of our method by extending an LVCSR baseline system (based on hidden Markov models and a 4-gram language model) with our rescoring component. We achieved a statistically significant reduction in word error rate compared to the baseline system.
A polyglot text-to-speech synthesis system which is able to read aloud mixed-lingual text has first of all to derive the correct pronunciation. This is achieved with an accurate morpho-syntactic analyzer that works simultaneously as language detector, followed by a phonological component which performs various phonological transformations. The result of these symbol processing steps is a complete phonological description of the speech to be synthesized. The subsequent processing step, i.e. prosody control, has to generate numerical values for the physical prosodic parameters from this description, a task that is very different from the former ones. This article shows appropriate solutions to both types of tasks, namely a particular rule-based approach for the phonological component and a statistical or machine learning approach to prosody control.
In forensic casework, the application of automatic speaker verification (SV) aims to determine the likelihood ratio of a suspect being vs. being not the speaker of an incriminating speech recording. For that purpose, the likelihood of the anti-speaker has to be estimated from the speech of an adequate number of other speakers. In many cases, speech signals of such an anti-speaker population are not available and it is generally too expensive to make an appropriate collection.This paper presents a practical procedure of forensic SV which is based on a text-dependent SV system and instead of an anti-speaker population, a special speech database is used to calibrate the valuation scale for an individual case.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.