Machine translation refers to a fully automated process that translates a user's input text into a target language. To improve the accuracy of machine translation, studies usually exploit not only the input text itself but also various background knowledge related to the text, such as visual information or prior knowledge. Herein, in this paper, we propose a multimodal neural machine translation system that uses both texts and their related images to translate Korean image captions into English. The data in the experiment is a set of unlabeled images only containing bilingual captions. To train the system with a supervised learning approach, we propose a weak-labeling method that selects a keyword from an image caption using feature selection methods. The keywords are used to roughly determine an image label. We also introduce an improved feature selection method using sentence clustering to select keywords that reflect the characteristics of the image captions more accurately. We found that our multimodal system achieves an improved performance compared to a text-only neural machine translation system (baseline). Furthermore, the additional images have positive impacts on addressing the issue of under-translation, where some words in a source sentence are falsely translated or not translated at all. INDEX TERMS Human-computer interaction, multi-layer neural network, natural language processing, image classification, multimodal neural machine translation, weak label.
The analysis of a speech act is important for dialogue understanding systems because the speech act of an utterance is closely associated with the user's intention in the utterance. This paper proposes a speech act classification model that effectively uses a two-layer hierarchical structure generated from the adjacency pair information of speech acts. The proposed model has two advantages when adding hierarchical information to speech act classification; the improved accuracy of the speech act classification and the reduced running time in the testing phase. As a result, it achieves higher performance than other models that do not use the hierarchical structure and has faster running time because Support Vector Machine classifiers can efficiently be arranged on the two-layer hierarchical structure.
Word sense disambiguation (WSD) is a task of determining a reasonable sense of a word in a particular context. Although recent studies have demonstrated some progress in the advancement of neural language models, the scope of research is still such that the senses of several words can only be determined in a few domains. Therefore, it is necessary to move toward developing a highly scalable process that can address a lot of senses occurring in various domains. This paper introduces a new large WSD dataset that is automatically constructed from the Oxford Dictionary, which is widely used as a standard source for the meaning of words. We propose a new WSD model that individually determines the sense of the word in accordance with its part of speech in the context. In addition, we introduce a hybrid sense prediction method that separately classifies the less frequently used senses for achieving a reasonable performance. We have conducted comparative experiments to demonstrate that the proposed method is more reliable compared with the baseline approaches. Also, we investigated the adaptation of the method to a realistic environment with the use of news articles. INDEX TERMS Computational and artificial intelligence, English vocabulary learning, natural language processing, neural networks, word sense disambiguation. YOONSEOK HEO received the B.S. and M.S. degrees in computer science (major in in natural language generation) from Sogang Unversity. He is currently pursuing the Ph.D. degree with the Department of Computer Science, Sogang University. He worked as a Researcher with Gachon University, in 2018. He is interested in spoken dialogue system, machine translation, question answering, machine reading comprehension, and named entity recognition. His current research focuses on the way of exploiting multimodal resources for machine translation and addressing large-scale open domain texts for machine reading comprehension.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.