The information used for human natural language comprehension is usually perceptual information, such as text, sounds, and images. In recent years, language models that learn semantics from single perceptual information sources (text) have gradually developed into multimodal language models that learn semantics from multiple perceptual information sources. Sound is perceptual information other than text that has been proven effective by many related works. However, there is still a need for further research on the incorporation method for perceptual information. Thus, this paper proposes a language model that synchronously trains dual perceptual information to enhance word representation. The representation is trained in a synchronized way that adopts an attention model to utilize both text and phonetic perceptual information in unsupervised learning tasks. On basis of that, these dual perceptual information is processed simultaneously, and that is similar with the cognitive process of human language understanding. The experiment results show that our approach achieve superior results in text classification and word similarity tasks with four languages of data set. INDEX TERMS Information representation, multi-layer neural network, natural language processing, unsupervised learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.