A heterogeneous stacking ensemble based sentiment analysis framework using multiple word embeddings

Subba, Basant; Kumari, Simpy

doi:10.1111/coin.12478

Cited by 18 publications

(3 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Alexandridis et al [27] used various language models to represent social media texts and Greek language text classifiers, using word embedding implemented by the GloVe model, to detect the polarity of opinions expressed on social media. The GloVe model has also been used in sentiment analysis models, often associated with a recurrent neural network module like long sort-term memory (LSTM) or GRU [6], [28], [29].…”

Section: Glovementioning

confidence: 99%

Effect of word embedding vector dimensionality on sentiment analysis through short and long texts

Chiny

Chihab

Lahcen

et al. 2023

IJ-AI

View full text Add to dashboard Cite

<span lang="EN-US">Word embedding has become the most popular method of lexical description in a given context in the natural language processing domain, especially through the word to vector (Word2Vec) and global vectors (GloVe) implementations. Since GloVe is a pre-trained model that provides access to word mapping vectors on many dimensionalities, a large number of applications rely on its prowess, especially in the field of sentiment analysis. However, in the literature, we found that in many cases, GloVe is implemented with arbitrary dimensionalities (often 300d) regardless of the length of the text to be analyzed. In this work, we conducted a study that identifies the effect of the dimensionality of word embedding mapping vectors on short and long texts in a sentiment analysis context. The results suggest that as the dimensionality of the vectors increases, the performance metrics of the model also increase for long texts. In contrast, for short texts, we recorded a threshold at which dimensionality does not matter.</span>

show abstract

Section: Glovementioning

confidence: 99%

Effect of word embedding vector dimensionality on sentiment analysis through short and long texts

Chiny

Chihab

Lahcen

et al. 2023

IJ-AI

View full text Add to dashboard Cite

show abstract

“…In most cases, Ensemble learning methods can be in the form of three popular ones, namely bagging [ 17 ], boosting [ 18 ], and stacking [ 19 ]. Many Researcher activities on ensemble learning centered around homogeneous ensembles, even though heterogeneous ensembles could prove more efficient in case of combining pre-trained models that are often readily available such as [ 20 , 21 ].…”

Section: Introductionmentioning

confidence: 99%

Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis

Saleh

Mostafa

Alharbi

et al. 2022

Sensors

View full text Add to dashboard Cite

Sentiment analysis was nominated as a hot research topic a decade ago for its increasing importance in analyzing the people’s opinions extracted from social media platforms. Although the Arabic language has a significant share of the content shared across social media platforms, its content’s sentiment analysis is still limited due to its complex morphological structures and the varieties of dialects. Traditional machine learning and deep neural algorithms have been used in a variety of studies to predict sentiment analysis. Therefore, a need of changing current mechanisms is required to increase the accuracy of sentiment analysis prediction. This paper proposed an optimized heterogeneous stacking ensemble model for enhancing the performance of Arabic sentiment analysis. The proposed model combines three different of pre-trained Deep Learning (DL) models: Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) in conjunction with three meta-learners Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) in order to enhance model’s performance for predicting Arabic sentiment analysis. The performance of the proposed model with RNN, LSTM, GRU, and the five regular ML techniques: Decision Tree (DT), LR, K-Nearest Neighbor (KNN), RF, and Naive Bayes (NB) are compared using three benchmarks Arabic dataset. Parameters of Machine Learning (ML) and DL are optimized using Grid search and KerasTuner, respectively. Accuracy, precision, recall, and f1-score were applied to evaluate the performance of the models and validate the results. The results show that the proposed ensemble model has achieved the best performance for each dataset compared with other models.

show abstract

“…Different distributional semantics models have been developed to generate embeddings, and these have proved to adequately capture the semantic properties of words, as long as sufficiently large corpora is used ( Joulin et al, 2016 ; Mikolov et al, 2013 ). Several application scenarios have been tested including: classification of twitter streams ( Khatua, Khatua & Cambria, 2019 ; Zhang & Luo, 2019 ), plagiarism detection ( Tien et al, 2019 ), opinion mining on social networks ( Nguyen & Le Nguyen, 2018 ; Rida-E-Fatima et al, 2019 ), recommendation systems ( Chamberlain et al, 2020 ; Baek & Chung, 2021 ), mapping of scientific domain keywords ( Hu et al, 2019 ), tracking emerging scientific keywords ( Dridi et al, 2019 ), optimization of queries for Information Retrieval ( Roy et al, 2019 ; Hofstätter et al, 2019 ) or sentiment analysis ( Santosh Kumar, Yadav & Dhavale, 2021 ; Subba & Kumari, 2022 ). However, the great majority of such models has been developed for English corpora.…”

Section: Introductionmentioning

confidence: 99%

Improving word embeddings in Portuguese: increasing accuracy while reducing the size of the corpus

Pinto

Viana

Teixeira

et al. 2022

PeerJ Computer Science

View full text Add to dashboard Cite

The subjectiveness of multimedia content description has a strong negative impact on tag-based information retrieval. In our work, we propose enhancing available descriptions by adding semantically related tags. To cope with this objective, we use a word embedding technique based on the Word2Vec neural network parameterized and trained using a new dataset built from online newspapers. A large number of news stories was scraped and pre-processed to build a new dataset. Our target language is Portuguese, one of the most spoken languages worldwide. The results achieved significantly outperform similar existing solutions developed in the scope of different languages, including Portuguese. Contributions include also an online application and API available for external use. Although the presented work has been designed to enhance multimedia content annotation, it can be used in several other application areas.

show abstract

A heterogeneous stacking ensemble based sentiment analysis framework using multiple word embeddings

Cited by 18 publications

References 40 publications

Effect of word embedding vector dimensionality on sentiment analysis through short and long texts

Effect of word embedding vector dimensionality on sentiment analysis through short and long texts

Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis

Improving word embeddings in Portuguese: increasing accuracy while reducing the size of the corpus

Contact Info

Product

Resources

About