In this paper, we describe an experimental speech translation system utilizing small, PC-based hardware with multi-modal user interface. Two major problems for people using an automatic speech translation device are speech recognition errors and language translation errors. In this paper we focus on developing techniques to overcome these problems. The techniques include a new language translation approach based on example sentences, simplified expression rules, and a multi-modal user interface which shows possible speech recognition candidates retrieved from the example sentences. Combination of the proposed techniques can provide accurate language translation performance even if the speech recognition result contains some errors. We propose to use keyword classes by looking at the dependency between keywords to detect the misrecognized keywords and to search the example expressions. Then, the suitable example expression is chosen using a touch panel or by pushing buttons. The language translation picks up the expression in the other language, which should always be grammatically correct. Simplified translated expressions are realized by speech-act based simplifying rules so that the system can avoid various redundant expressions. A simple comparison study showed that the proposed method outputs almost 2 to 10 times faster than a conventional translation device.
For mobile use, we are developing a multi-lingual example-sentence-driven speech translation system that has a multi-modal input interface to retrieve the sentence. This paper discusses the characteristics of each input mode, the synergistic effect obtained by combining the modes and the results of evaluations that show the difference between system performance in the laboratory and in the real world. As the evaluation criteria, we adopted the retrieval time and the retrieval precision of the sentence. When all of the modes were available, the precision within 30 seconds was 86.8% for a closed test set and 76.8% for an open test set. When the retrieval was completed with only one operation, the average time was 10.3 seconds for a closed set. The precision was 12.0% higher than the maximum precision obtained when only one of the modes was available. The results show that synergetic effect of the combined modes certainly exists and all the modes are necessary to improve the system's usability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.