The Impact of Weighting Schemes and Stemming Process on Topic Modeling of Arabic Long and Short Texts

Ma, Tinghuai; Al-Sabri, Raeed; Zhang, Lejun; Marah, Bockarie Daniel; Al-Nabhan, Najla

doi:10.1145/3405843

Cited by 15 publications

(6 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data preprocessing: Data preprocessing is considered as an essential step in machine learning and data mining( [25] ; [26] ; [27] ; [28]). The reviews usually contain incomplete sentences, much noise, and weak wording such as words without application with high repetition, imperfect words, and incorrect grammar.…”

Section: Data Pre-processingmentioning

confidence: 99%

Study on sentiment classification strategies based on the fuzzy logic with crow search algorithm

AL-Deen

Aldhubri

et al. 2022

Soft Comput

View full text Add to dashboard Cite

In recent times, sentiment analysis research has gained wide popularity. That situation is caused by the nature of online applications that allow users to express their opinions on events, services, or products through social media applications such as Twitter, Facebook, and Amazon. This paper proposes a novel sentiment classification method according to the Fuzzy rule-based system (FRBS) with crow search algorithm (CSA). FRBS is used to classify the polarity of sentences or documents, and the CSA is employed to optimize the best output from the fuzzy logic algorithm. The FRBS is applied to extract the sentiment and classify its polarity into negative, neutral, and positive. Sometimes, the outputs of the FRBS must be enhanced, especially since many variables are present and the rules between them overlap. For such cases, the CSA is used to solve this limitation faced by FRBS to optimize the outputs of FRBS and achieve the best result. We compared the performance of our proposed model with different machine learning algorithms, such as SVM, maximum entropy, boosting, and SWESA. We tested our model on three famous data sets collected from Amazon, Yelp, and IMDB. Experimental results demonstrated the effectiveness of the proposed model and achieved competitive performance in terms of accuracy, recall, precision, and the Fscore.

show abstract

Section: Data Pre-processingmentioning

confidence: 99%

Study on sentiment classification strategies based on the fuzzy logic with crow search algorithm

AL-Deen

Aldhubri

et al. 2022

Soft Comput

View full text Add to dashboard Cite

show abstract

“…In most natural language processing applications, words are used as features. The most popular word vector representations are distributed representation and one-hot representation [27,47]. However, the one-hot representation has various problems, such as the too-large vector dimension, the sparsity of the word vector, and ignoring the word semantic association.…”

Section: Embedding (Word Representation)mentioning

confidence: 99%

Transformer-Based Graph Convolutional Network for Sentiment Analysis

et al. 2022

Self Cite

View full text Add to dashboard Cite

Sentiment Analysis is an essential research topic in the field of natural language processing (NLP) and has attracted the attention of many researchers in the last few years. Recently, deep neural network (DNN) models have been used for sentiment analysis tasks, achieving promising results. Although these models can analyze sequences of arbitrary length, utilizing them in the feature extraction layer of a DNN increases the dimensionality of the feature space. More recently, graph neural networks (GNNs) have achieved a promising performance in different NLP tasks. However, previous models cannot be transferred to a large corpus and neglect the heterogeneity of textual graphs. To overcome these difficulties, we propose a new Transformer-based graph convolutional network for heterogeneous graphs called Sentiment Transformer Graph Convolutional Network (ST-GCN). To the best of our knowledge, this is the first study to model the sentiment corpus as a heterogeneous graph and learn document and word embeddings using the proposed sentiment graph transformer neural network. In addition, our model offers an easy mechanism to fuse node positional information for graph datasets using Laplacian eigenvectors. Extensive experiments on four standard datasets show that our model outperforms the existing state-of-the-art models.

show abstract

“…The text clustering techniques mostly objective to create text papers clusters related to the papers with the basis of intrinsic contents. once start clustering method, the text documents should be procced with the pre-processing methods such as tokenization [21], removal of stop words and stemming [22] process. Hence, the text documents are changed into a required format.…”

Section: Pre-processingmentioning

confidence: 99%

Type2 IFC with SOA for Topic Detection and Document Clustering Analysis

Perumal¹,

Mathivanan²

2021

Preprint

View full text Add to dashboard Cite

The automatic document clustering and topic extraction from the corpus provides a very essential requirement in many real time applications. The document clustering and topic detection is utilized to locating data quickly. Hence, in this paper, Type 2 Intuitionistic Fuzzy Clustering and Seagull Optimization Algorithm (Type 2 IFCSOA) is developed for document clustering and topic detection. The Type 2 IFCSOA is utilized to cluster the documents. Additionally, ensemble approach is utilized to identify by the topics from the clustered documents. In the proposed methodology, the pre-processing is utilized to remove unwanted information from the documents such as tokenization, stop word removal and stemming process. After that, the proposed method is utilized to cluster the documents. The clustered documents are labeled with the basis of clusters. After that, to achieve topic detection, the ensemble approach is utilized with feature extraction phases such as Term Frequency- Inverse Document Frequency (TF-IDF), Mutual information (MI), Text Rank Algorithm and analysis of keyword taking out from co-occurrence statistical -Information (CSI). The proposed methodology is implemented in MATLAB and performances were evaluated with the statistical measurements such as precision, recall, accuracy, sensitivity, purity measure and entropy. The proposed method is compared with the conventional methods such as Fuzzy C Means clustering (FCM), FCM-Particle Swarm Optimization (PSO), FCM-Genetic Algorithm (GA) and K means clustering.

show abstract

The Impact of Weighting Schemes and Stemming Process on Topic Modeling of Arabic Long and Short Texts

Cited by 15 publications

References 27 publications

Study on sentiment classification strategies based on the fuzzy logic with crow search algorithm

Study on sentiment classification strategies based on the fuzzy logic with crow search algorithm

Transformer-Based Graph Convolutional Network for Sentiment Analysis

Type2 IFC with SOA for Topic Detection and Document Clustering Analysis

Contact Info

Product

Resources

About