Word embedding models become an increasingly important method that embeds words into a high dimensional space. These models have been widely utilized to extract semantic and syntactic features for sentiment analysis. However, using word embedding models cannot be sufficient for sentiment analysis tasks because they do not contain sentiment features. Therefore, word embedding models do not adequately meet the comprehensive needs of sentiment analysis applications that rely on recognizing the polarity of a sentence. In this paper, we propose a sentiment embedding model (Word2Sent model) to tackle the weaknesses of the existing word embedding models for sentiment analysis applications. We developed this model based on the Continuous Bag‐of‐Words model and SentiWordNet lexicon to learn sentiment embedding for each word from its surrounding context words. It preserves semantic and syntactic features and captures implicitly sentiment ones. Besides, it can predict sentiment features in a very low sentiment embeddings dimension than traditional ones. The proposed method provides an improved sentiment classification performance and lowers the computational complexity. Both the accuracy performance and processing time results obtained indicate that the proposed model is particularly promising.
Natural Language Processing problems generally require the use of pre-trained distributed word representations to be solved with deep learning models. However, distributed representations usually rely on contextual information which prevents them from learning all the important word characteristics. The task of sentiment analysis suffers from such a problem because sentiment information is ignored during the process of learning word embeddings. The performance of sentiment analysis can be affected since two words with similar vectors may have opposite sentiment orientations. The present paper introduces a novel model called Continuous Sentiment Contextualized Vectors (CSCV) to address this problem. The proposed model can learn word sentiment embedding using its surrounding context words. It uses Continuous Bag-of-Words (CBOW) model to deal with the context and sentiment lexicons to identify sentiment. Existing pre-trained vectors are combined then with the obtained sentiment vectors using Principal component analysis (PCA) to enhance their quality. The experiments show that: (1) CSCV vectors can be used to enhance any pre-trained word vectors; (2) The result vectors strongly alleviate the problem of similar words with opposite polarities; (3) The performance of sentiment classification is improved by applying this approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.