Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with a character-based neural language model, and used to improve inference about language varieties not seen during training. In experiments with 1303 Bible translations into 990 different languages , we empirically explore the capacity of multilingual language models, and also show that the language vectors capture genetic relationships between languages .
This demonstration involves two-way automatic speechto-speech translation on a consumer off-the-shelf PDA. This work was done as part of the DARPA-funded Babylon project, investigating better speech-to-speech translation systems for communication in the field. The development of the Speechalator software-based translation system required addressing a number of hard issues, including a new language for the team (Egyptian Arabic), close integration on a small device, computational efficiency on a limited platform, and scalable coverage for the domain.
In this paper, we discuss the design of a database of recorded and transcribed read and spontaneous speech of semi-fluent, strongly-accented non-native speakers of English. While many speech applications work best with a recognizer that expects native-like usage, others could benefit from a speech recognition component that is forgiving of the sorts of errors that are not a barrier to communication; in order to train such a recognizer a database of non-native speech is needed. We examine how collecting data from non-native speakers must necessarily differ from collection from native speakers, and describe work we did to develop an appropriate scenario, recording setup, and optimal surroundings during recording.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.