Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with a character-based neural language model, and used to improve inference about language varieties not seen during training. In experiments with 1303 Bible translations into 990 different languages , we empirically explore the capacity of multilingual language models, and also show that the language vectors capture genetic relationships between languages .
IF (Interchange Format), the interlingua used by the C-STAR consortium, is a speech-act based interlingua for task-oriented dialogue. IF was designed as a practical interlingua that could strike a balance between expressivity and simplicity. If it is too simple, components of meaning will be lost and coverage of unseen data will be low. On the other hand, if it is too complex, it cannot be used with a high degree of consistency by collaborators on different continents. In this paper, we suggest methods for evaluating the coverage of IF and the consistency with which it was used in the C-STAR consortium.
We describe the development of a robust and flexible Thai Speech Recognizer as integrated into our English-Thai Speech-to-Speech translation system. We focus on the discussion of the rapid deployment of ASR for Thai under limited time and data resources, including rapid data collection issues, acoustic model bootstrap, and automatic generation of pronunciations. Issues relating to the translation and overall system will be reported elsewhere.
Abstract. JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneousconversation in a limited domain. In this paper we describe our methodology for evaluating translation performance. Our current focus is on end-to-end evaluations -the evaluation of the translation capabilities of the system as a whole. The main goal of our end-to-end evaluation procedure is to determine translation accuracy on a test set of previously unseen dialogues. Other goals include evaluating the effectiveness of the system in conveying the domain relevant information and in detecting and dealing appropriately with utterances (or portions of utterances) that are out-of-domain. End-toend evaluations are performed in order to verify the general coverage of our knowledge sources, guide our development efforts, and to track our improvement over time. We discuss our evaluation procedures, the criteria used for assigning scores to translations produced by the system, and the tools developed for performing this task. Our most recent Spanish-to-English performance evaluation results are presented as an example.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.