Abstract. This work describes classification of speech from native and non-native speakers, enabling accent-dependent automatic speech recognition. In addition to the acoustic signal, lexical features from transcripts of the speech data can also provide significant evidence of a speaker's accent type. Subsets of the Fisher corpus, ranging over diverse accents, were used for these experiments. Relative to human-audited judgments, accent classifiers that exploited acoustic and lexical features achieved up to 84.5% classification accuracy. Compared to a system trained only on native speakers, using this classifier in a recognizer with accent-specific acoustic and language models resulted in 16.5% improvement for the non-native speakers, and a 7.2% improvement overall.
We describe the large vocabulary automatic speech recognition system developed for Modern Standard Arabic by the SRI/Nightingale team, and used for the 2007 GALE evaluation as part of the speech translation system. We show how system performance is affected by different development choices, ranging from text processing and lexicon to decoding system architecture design. Word error rate results are reported on broadcast news and conversational data from the GALE development and evaluation test sets.
This paper describes a simple method for significantly improving Tandem features used to train acoustic models for large-vocabulary speech recognition. The linear activations at the outputs of an MLP classifier were modified according to known reference labels: where necessary, the activation of the output unit corresponding to the correct phone label was increased in order to make an accurate classification. This technique was inspired by another experiment that determined a lower error bound on ASR performance within the Tandem framework. By simulating an idealized classifier with forward-backward phone posterior probabilities, we observed a best-case scenario in which nearly all errors were eliminated. Although this performance is not practically achievable, the experiment demonstrated the validity of the Tandem processing approach and suggested that considerable gains are possible by improving the MLP phone classifier.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.