Verbmobil, a German research project, aims at machine translation of spontaneous speech input. The ultimate goal is the development o f a portable machine translator that will allow people to negotiate in their native language. Within this project the University of Karlsruhe has developed a speech recognition engine that has been evaluated on a y early basis during the project and shows very promising speech recognition word accuracy results on large vocabulary spontaneous speech. In this paper we will introduce the Janus Speech Recognition Toolkit underlying the speech recognizer. The main new contributions to the acoustic modeling part of our 1996 evaluation system speaker normalization, channel normalization and polyphonic clustering will be discussed and evaluated. Besides the acoustic models we delineate the di erent language models used in our evaluation system: Word trigram models interpolated with class based models and a separate spelling language model were applied. As a result of using the toolkit and integrating all these parts into the recognition engine the word error rate on the German Spontaneous Scheduling Task GSST could be decreased from 30 word error rate in 1995 to 13.8 in 1996.
unseen test text was determined through cross vali dation on a l la v a i l a bl e text d a ta. A s a d esi rabl eb a s e l i n e, word accuracy was a l s o t e s t e d o n a cl osed-vocabul a ry scenario yi e l d i n g a perform ance of 6 6. 9%.
Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with a character-based neural language model, and used to improve inference about language varieties not seen during training. In experiments with 1303 Bible translations into 990 different languages , we empirically explore the capacity of multilingual language models, and also show that the language vectors capture genetic relationships between languages .
The Turkish language belongs to the Turkic family. All members of this family are close to one another in terms of linguistic structure. Typological similarities are vowel harmony, verb-final word order and agglutinative morphology. This latter property causes a very fast vocabulary growth resulting in a large number of out-of-vocabulary words. In this paper we describe our first experiments in a speaker independent LVCSR engine for Modern Standard Turkish. First results on our Turkish speech recognition system are presented. The currently best system shows very promising results achieving 16.9% word error rate. To overcome the 00V-problem we propose a morphem-based and the Hypothesis Driven Lexical Adaptation approach. The final Turkish system is integrated into the multilingual recognition engine of the GlobalPhone project.are achieved when using Turkish words as dictionary units. The following example illustrates the morphological structure of the Turkish language:Osman-li-laS-tir-ama-yabil-ecek-ler-imiz-den-miS-siniz (English: behaving as if you were of those whom we might consider not converting into Ottoman, see [2]). It can be seen easily, that agglutination results in a word length that is exceptional for the Turkish language.In this paper we present our experiments on Modem Standard Turkish, which is the most widespread language of the Turkic family spoken by about 65 million speakers. For all experiments we use the Turkish part of our Globalphone database, which is briefly introduced in the first section of this paper. The second section describes important properties of the Turkish language and resulting problems for LVCSR. The paper concludes by presenting several recognition experiments and results. 1NTROI)UCTION 2. GLOBALPHONE DATABASEFor languages like English many large vocabulary continuous speech recognition engines have been evaluated on several different tasks. Recently the interest increased on LVCSR systems in Asian languages like Chinese, Japanese and Korean. Furthermore, projects like SQALE [I] focus on transferring the evaluation paradigms and training methods to languages spoken in Europe like French, German, Spanish etc. However, so far there have been no attemps for Turkish Large Vocabulary Continuous Speech Recognition (LVCSR). This has several reasons: First, there is a lack of speech databases and text corpora for the Turkish language, and knowledge sources like pronunciation dictionaries are not available yet. Second, Turkish is very different from Indo-European languages because its morphology is agglutinative and suffixing. This means that the inflection, the derivation and other relationships between words in a sentence are done by constantly concatenating suffixes to the word stem. Therefore, the vocabulary growth rate is very high resulting in a large number of out-of-vocabulary words (OOV). As a consequence poor recognition results Turkish LVCSR presented here is evaluated in the framework of the GlobalPhone project. The database of this project currently consists of 15 langu...
One of the most prevailing problems of large-vocabulary speech recognition systems is the large number of out-of-vocabulary words. This is especially the case for automatically transcribing broadcast news in languages other than English, that have a large number of inflections and compound words. We introduce a set of techniques to decrease the number of out-of-vocabulary words during recognition by using linguistic knowledge about morphology and a two-pass recognition approach, where the first pass only serves to dynamically adapt the recognition dictionary to the speech segment to be recognized. A second recognition run is then carried out on the adapted vocabulary. With the proposed techniques we were able to reduce the OOV-rate by more than 40% thereby also improving recognition results by an absolute 5.8% from 64% word accuracy to 69.8%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.