In this paper, we present a 2-way speech-to-speech translation system for English and Iraqi colloquial Arabic, the dialect of Arabic spoken by ordinary people in Iraq. The application domain of the system is military force protection, including municipal services surveys, detainee screening, and descriptions of people, houses, vehicles, etc. The system uses statistical speech recognition, and a combination of prerecorded questions and statistical machine translation with speech synthesis to translate the speech recognition output. We present evaluation results, along with an analysis of the gap between Iraqi-to-English and English-to-Iraqi translation performance.Index Terms: Speech-to-speech translation, Iraqi Arabic INTRODUCTIONIn this paper, we present a 2-way speech-to-speech translation (S2S) system for English and Iraqi colloquial Arabic, the dialect of Arabic spoken by ordinary people in Iraq. The system facilitates communication between an English speaker and a speaker of Iraqi colloquial Arabic. Iraqi colloquial Arabic is the language spoken by ordinary people in Iraq, and differs considerably from the Modern Standard Arabic (MSA) that is used in writing and broadcast news. In particular, the pronunciations of words and sometimes even the words themselves are different between the two, making Iraqi Arabic in effect a low-resource language. The application domain of the system is military force protection, including checkpoints, municipal services surveys, and questions about people, buildings, vehicles, etc.The S2S system described in this paper is being developed as part of DARPA's TRANSTAC program. The systems being developed in this program are broadly classified as being either "1.5-way" or "2-way". The 2-way systems seek, in principle, to translate any utterance, in either direction, typically by using broad-coverage statistical machine translation (SMT) components trained on large parallel corpora. Examples are the systems developed by IBM [1], SRI [2], and CMU [3]. The 1.5-way systems, by contrast, use a task-directed approach to make the communication problem easier, by specifying a fixed set of English questions with pre-recorded foreign-language translations, together with a constrained set of foreignlanguage answers that can be translated into English. An example question would be "How old is he?" or "Is the roof of the house tiled or concrete". Example systems are a system earlier developed by BBN [4] and a system developed by Sehda [5].In spite of its more limited coverage, the 1.5-way approach does have advantages. In particular, with a 1.5-way system the English-speaking user always knows exactly what the system has said to the Iraqi speaker, and furthermore knows that it was spoken accurately, fluently, and intelligibly to him. In a statistical 2-way system, by contrast, machine translation or speech synthesis errors are always possible, even when the speech recognition itself is completely correct. The system described here is designed to be a hybrid between the two approaches, combining...
In this paper we present a speech-to-speech translation system configured for translingual communication in English and colloquial Iraqi on a mobile, handheld device. The end-to-end system employs a medium/large vocabulary n-gram speech recognition engine for recognizing English and colloquial Iraqi, a question canonicalizer for mapping a recognized English question or command to one of the questions supported in the system, a concept translation engine for translating recognized Iraqi text, and a text-to-speech synthesis engine for playing back the English translation for the Iraqi to the English speaker. In addition to describing the system architecture and the functionality of the components, we present optimization techniques that enable low-latency, real-time speech recognition on lowpower hardware platforms.
We report on recent improvements in our English/Iraqi Arabic speech-to-speech translation system. User interface improvements include a novel parallel approach to user confirmation which makes confirmation cost-free in terms of dialog duration. Automatic speech recognition improvements include the incorporation of state-of-the-art techniques in feature transformation and discriminative training. Machine translation improvements include a novel combination of multiple alignments derived from various pre-processing techniques, such as Arabic segmentation and English word compounding, higher order N-grams for target language model, and use of context in form of semantic classes and Part-of-Speech tags.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.