Natural language processing is a research direction in many fields such as linguistics, computer science, and data fusion of study. The representation of word vector is a method to map words into the real vector space, which is the core technology of many current natural language processing tasks. This paper summarizes and studies some typical expression methods of word vector as well as research the word vectors of linguistics and mathematical principle. We first elaborate the process of mapping the words to the vector, namely, encoding natural language information to word vector according to semantics. Secondly, we analyze several typical methods such as co-occurrence matrix, Word2Vec, GloVe, ELMo on information carrying capacity. Thirdly, on the basis of analyzing the principles of these methods, this paper also uses SVD decomposition, neural network, and other methods respectively to reproduce the specific process of generating word vectors. Finally, combining word similarity calculation and text sentiment classification task, we compare the performance of word vectors trained by various methods in different tasks. Experiments verify the conclusion that different word vector generation methods have different emphases in carrying linguistic ability and perform differently in different tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.