Cross-Domain Sentiment Encoding through Stochastic Word Embedding

Hao, Yanbin; Mu, Tingting; Hong, Richang; Wang, Meng; Liu, Xueliang; Goulermas, John Y.

doi:10.1109/tkde.2019.2913379

Cited by 36 publications

(19 citation statements)

References 29 publications

(61 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the development of NLP, Sentiment Analysis has been paid more attention by researchers and many efforts have been made in word embeddings. Jiang et al [5] proposed Bag-of-words text representation method based on sentiment topic words, which is composed of deep neural network, sentiment topic words and context information, and performed well in Sentiment Analysis; Rezaeinia et al [6] proposed refined word embeddings method based on Part-of-Speech(POS) tagging technology and sentiment lexicons, which improved the performance of pre-trained word embeddings in Sentiment Analysis; Pham et al [7] proposed a joint model of multiple Convolutional Neural Networks (CNNs), which is focused on word embeddings from Word2Vec, GloVe and the one-hot character vectors, and it achieved good performance in aspect sentiment classification tasks; Zhou et al [8] constructed a text representation model containing TF-IDF and topic features based on LDA for Sentiment Analysis, which reduced the dimension of word vector space in the traditional representation model; Han et al [9] built a hybrid neural network model using convolutional neural networks and long short-term memory(LSTM) for document representation, and it incorporated user's and product's information; Devlin J. et al [10] proposed the BERT model to represent text, which can better reflect the modifying relationship between words in texts, and it had good performance in Sentiment Analysis tasks; Liu et al [11] proposed latent topic information of the text that used Neural topic model into word-level semantics representations to deal with the problem of data sparsity, and presented a new topic-word attention mechanism to explore the semantics of words from the perspective of topic word association; Li et al [12] proposed a framework that combined different levels of prior knowledge into word embeddings for Sentiment Analysis, which improved the performance of Sentiment Analysis; Xu et al [13] proposed an improved word representation method, which integrated the contribution of sentiment information into the traditional TF-IDF algorithm and generated weighted word vectors, and the method had higher F1 score; Peters M. E. et al [14] proposed a text representation model based on deep learning framework, and it constructed an English text representation model which contained grammar feature, semantics feature and sentiment feature by training a large number of sentiment text corpus; Hao et al [15] proposed a method for cross domain sentiment classification using random embeddings, which retained similar structure in embedding space and achieved well results in the task of Sentiment Analysis; Usama et al [16] merged multilevel features which are from different layers of the same network and different network architectures to improve the accuracy of Sentiment Analysis; Majumder et al [17] demonstrated the correlation between sarcasm detection and sentiment classification, and proposed a multitasking learning framework to improve the performance of two tasks; Ma et al [18] proposed Sentic...…”

Section: Related Workmentioning

confidence: 99%

Refined Global Word Embeddings Based on Sentiment Concept for Sentiment Analysis

Wang

Huang

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Sentiment Analysis is an important research direction of natural language processing, and it is widely used in politics, news and other fields. Word embeddings play a significant role in sentiment analysis. The existing sentiment embeddings methods directly embed the sentiment lexicons into traditional word representation. This sentiment representation methods can only differentiate the sentiment information of different words, not the same word in different contexts, so it cannot provide accurate sentiment information for word in different contexts. This paper proposes sentiment concept to solve the problem. First, we found the optimal sentiment concept of words in Microsoft Concept Graph according to the context of words. Then we obtained the sentiment information of words under optimal sentiment concept from the multi-semantics sentiment intensity lexicon which we constructed in this paper to achieve accurate embedding of sentiment information and provide more accurate semantics and sentiment representation for words. Finally, we combined two refined word embeddings methods to achieve a more comprehensive word representation. Compared with traditional and sentiment embeddings methods on six representative datasets, the validity of the word embeddings method based on sentiment concept proposed in this paper is verified.INDEX TERMS Deep learning, sentiment analysis, sentiment concept, word embeddings.

show abstract

Section: Related Workmentioning

confidence: 99%

Refined Global Word Embeddings Based on Sentiment Concept for Sentiment Analysis

Wang

Huang

et al. 2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…t-SNE, similar to transformers, examines the proximity of words to each other [27]. Hao et al tackle cross-domain sentiment alignment by applying stochastic word embedding [28].…”

Section: Topic Modeling As a Part Of Natural Language Processingmentioning

confidence: 99%

Latent Dirichlet Allocation and t-Distributed Stochastic Neighbor Embedding Enhance Scientific Reading Comprehension of Articles Related to Enterprise Architecture

Buchkremer

Gampfer

Buchkremer

2021

View full text Add to dashboard Cite

As the amount of scientific information increases steadily, it is crucial to improve fast-reading comprehension. To grasp many scientific articles in a short period, artificial intelligence becomes essential. This paper aims to apply artificial intelligence methodologies to examine broad topics such as enterprise architecture in scientific articles. Analyzing abstracts with latent dirichlet allocation or inverse document frequency appears to be more beneficial than exploring full texts. Furthermore, we demonstrate that t-distributed stochastic neighbor embedding is well suited to explore the degree of connectivity to neighboring topics, such as complexity theory. Artificial intelligence produces results that are similar to those obtained by manual reading. Our full-text study confirms enterprise architecture trends such as sustainability and modeling languages.

show abstract

“…Furthermore, Yu et al [ 34 ] presented a new way to refine word embeddings for sentiment analysis using intensity scores from sentiment lexicons. Moreover, Hao et al [ 35 ] applied a novel stochastic embedding technique for cross-domain sentiment classification, preserving the similarity in the embedding space. Finally, Ali et al [ 36 ] proposed a system that retrieved transport content from social networks, representing the documents with word embedding techniques and achieving an effective approach to sentiment classification with 93% accuracy.…”

Section: Background and Related Workmentioning

confidence: 99%

An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

Pota

Ventura

Catelli

et al. 2020

Sensors

View full text Add to dashboard Cite

Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages.

show abstract

Cross-Domain Sentiment Encoding through Stochastic Word Embedding

Cited by 36 publications

References 29 publications

Refined Global Word Embeddings Based on Sentiment Concept for Sentiment Analysis

Refined Global Word Embeddings Based on Sentiment Concept for Sentiment Analysis

Latent Dirichlet Allocation and t-Distributed Stochastic Neighbor Embedding Enhance Scientific Reading Comprehension of Articles Related to Enterprise Architecture

An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

Contact Info

Product

Resources

About