In this paper, we present a 2-way speech-to-speech translation system for English and Iraqi colloquial Arabic, the dialect of Arabic spoken by ordinary people in Iraq. The application domain of the system is military force protection, including municipal services surveys, detainee screening, and descriptions of people, houses, vehicles, etc. The system uses statistical speech recognition, and a combination of prerecorded questions and statistical machine translation with speech synthesis to translate the speech recognition output. We present evaluation results, along with an analysis of the gap between Iraqi-to-English and English-to-Iraqi translation performance.Index Terms: Speech-to-speech translation, Iraqi Arabic
INTRODUCTIONIn this paper, we present a 2-way speech-to-speech translation (S2S) system for English and Iraqi colloquial Arabic, the dialect of Arabic spoken by ordinary people in Iraq. The system facilitates communication between an English speaker and a speaker of Iraqi colloquial Arabic. Iraqi colloquial Arabic is the language spoken by ordinary people in Iraq, and differs considerably from the Modern Standard Arabic (MSA) that is used in writing and broadcast news. In particular, the pronunciations of words and sometimes even the words themselves are different between the two, making Iraqi Arabic in effect a low-resource language. The application domain of the system is military force protection, including checkpoints, municipal services surveys, and questions about people, buildings, vehicles, etc.The S2S system described in this paper is being developed as part of DARPA's TRANSTAC program. The systems being developed in this program are broadly classified as being either "1.5-way" or "2-way". The 2-way systems seek, in principle, to translate any utterance, in either direction, typically by using broad-coverage statistical machine translation (SMT) components trained on large parallel corpora. Examples are the systems developed by IBM [1], SRI [2], and CMU [3]. The 1.5-way systems, by contrast, use a task-directed approach to make the communication problem easier, by specifying a fixed set of English questions with pre-recorded foreign-language translations, together with a constrained set of foreignlanguage answers that can be translated into English. An example question would be "How old is he?" or "Is the roof of the house tiled or concrete". Example systems are a system earlier developed by BBN [4] and a system developed by Sehda [5].In spite of its more limited coverage, the 1.5-way approach does have advantages. In particular, with a 1.5-way system the English-speaking user always knows exactly what the system has said to the Iraqi speaker, and furthermore knows that it was spoken accurately, fluently, and intelligibly to him. In a statistical 2-way system, by contrast, machine translation or speech synthesis errors are always possible, even when the speech recognition itself is completely correct. The system described here is designed to be a hybrid between the two approaches, combining...