A real-time deep-learning approach for filtering Arabic low-quality content and accounts on Twitter

Alharthi, Reem; Alhothali, Areej; Moria, Kawthar

doi:10.1016/j.is.2021.101740

Cited by 21 publications

(10 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LSTM with the Word2Vec model achieves an F1-score of 98.03% for word segmentation in the Arabic language (Almuhareb et al 2019 ). Neural network-based word embedding efficiently models a word and its context and has become one of the most widely used methods of word distribution representation (N.H. Phat and Anh 2020 )(Alharthi et al 2021 ).…”

Section: Review On Text Analytics Word Embedding Application and Deep...mentioning

confidence: 99%

“… Craja et al ( 2020 ) Annual report analysis for fraud detection EDGAR database LR, RF, SVM, XGB, ANN, HAN Word2Vec HAN achieves an accuracy of 84.57% 2. Alharthi et al ( 2021 ) Arabic text low-quality content classification Twitter dataset CNN, LSTM Word2Vec, AraVec LSTM achieves an accuracy of 98% 3. Kozlowski et al ( 2020 ) French social media tweet analysis for crisis management French dataset SVM, CNN fastText, BERT, French FlauBert FlauBert achieves a micro F1-score of 85.4% 4.…”

Section: Appendix Amentioning

confidence: 99%

See 1 more Smart Citation

Impact of word embedding models on text analytics in deep learning environment: a review

2023

View full text Add to dashboard Cite

The selection of word embedding and deep learning models for better outcomes is vital. Word embeddings are an n-dimensional distributed representation of a text that attempts to capture the meanings of the words. Deep learning models utilize multiple computing layers to learn hierarchical representations of data. The word embedding technique represented by deep learning has received much attention. It is used in various natural language processing (NLP) applications, such as text classification, sentiment analysis, named entity recognition, topic modeling, etc. This paper reviews the representative methods of the most prominent word embedding and deep learning models. It presents an overview of recent research trends in NLP and a detailed understanding of how to use these models to achieve efficient results on text analytics tasks. The review summarizes, contrasts, and compares numerous word embedding and deep learning models and includes a list of prominent datasets, tools, APIs, and popular publications. A reference for selecting a suitable word embedding and deep learning approach is presented based on a comparative analysis of different techniques to perform text analytics tasks. This paper can serve as a quick reference for learning the basics, benefits, and challenges of various word representation approaches and deep learning models, with their application to text analytics and a future outlook on research. It can be concluded from the findings of this study that domain-specific word embedding and the long short term memory model can be employed to improve overall text analytics task performance.

show abstract

Section: Review On Text Analytics Word Embedding Application and Deep...mentioning

confidence: 99%

Section: Appendix Amentioning

confidence: 99%

Impact of word embedding models on text analytics in deep learning environment: a review

2023

View full text Add to dashboard Cite

show abstract

“…CNN works by shaping input data into a two-dimensional matrix, like time series or image pixels. Another standard layer in DL is the long short-term memory (LSTM), which learns long-term dependencies in sequential data such as text and time series [16].…”

Section: A Deep Learningmentioning

confidence: 99%

“…The study's findings showed that using CNN-based deep learning models is suitable for various large datasets and binary classification of a specific task. Alharthi et al [16] presented a real-time model to identify low-quality tweets and accounts that make these contents on Twitter. They extracted the Arabic dataset from Twitter using Twitter API and then applied both CNN and LSTM techniques.…”

Section: B Arabic Text Classificationmentioning

confidence: 99%

Arabic Rumor Detection Using Contextual Deep Bidirectional Language Modeling

et al. 2022

View full text Add to dashboard Cite

In today's world, news outlets have changed dramatically; newspapers are obsolete, and radio is no longer in the picture. People look for news online and on social media, such as Twitter and Facebook. Social media contributors share information and trending stories before verifying their truthfulness, thus, spreading rumors. Early identification of rumors from social media has attracted many researchers. However, a relatively smaller number of studies focused on other languages, such as Arabic. In this study, an Arabic rumor detection model is proposed. The model was built using transformer-based deep learning architecture. According to the literature, transformers are neural networks with outstanding performance in natural language processing tasks. Two transformers-based models, AraBERT and MARBERT, were employed, tested, and evaluated using three recently developed Arabic datasets. These models are extensions to the BERT, Bidirectional Encoder Representations from Transformers, a deep learning model that uses transformer architecture to learn the text representations and leverages the attention mechanism. We have also mitigated the challenges introduced by the imbalanced training datasets by employing two sampling techniques. The experimental results of our proposed approaches achieved 0.97 accuracy. This result demonstrated the effectiveness of the proposed method and outperformed other existing Arabic rumor detection methods.

show abstract

“…Authors used various classi er algorithms as RF, Decision Trees (DT), Decision Table, Random Tree, KStar, Bayes Net and Simple Logistic, Content . They calculated classification, Accuracy-94.5% FPR-4.1% FNR-6.6% for spam dataset[11].Alharthi,Alhothali and Moria collected 10,000 Arabic tweets dataset for prediction.Authors used machine learning algorithms with Long Short Term Memory and word embedding feature representation.The system classification accuracy depends on tweet length and evaluated values 0.97,0.98,0.95 for Accuracy, Precision and Recall respectively[12].Liu, Pang and Wang used 97,839 Restaurant with 31,317 Hotel review dataset for classi cation. They used Machine Learning techniques, Bi-LSTM and multimodal neural network model to analyze the effective features to improve the performance, Recall-0.80 Precision-0.82 F1-score-0.81[13].Saidani, Adi and Allili organized dataset from, Enron Corpus consisting of 2,893 messages with 2,412…”

mentioning

confidence: 99%

Measure the Performance by Analysis of Different Boosting algorithms on Various Patterns of Phishing datasets

Pandey

Singh

Pal

et al. 2022

Preprint

View full text Add to dashboard Cite

The internet has become an important component of our daily lives. The most common internet service is web surfing. Many individuals use their browser to do things like online shopping, bill payment, cell phone recharge, and banking transactions. Customers confront different security dangers, such as cyber crime, as a result of the widespread usage of this service. Cyber phishing is a type of web threat that entices users to connect with a false website. The main goal of this research paper is to prevent the user's sensitive information. The proposed model is developed in three steps in step1 we choose a dataset to train on, and then test classifiers on the dataset. In step2 we have applied the three classifiers finding phishing detection accuracy and finally step3 after completing all of the predictions, we discovered that XGBoost outperformed AdaBoost and Gradient boosting machine learning algorithms.

show abstract

A real-time deep-learning approach for filtering Arabic low-quality content and accounts on Twitter

Cited by 21 publications

References 16 publications

Impact of word embedding models on text analytics in deep learning environment: a review

Impact of word embedding models on text analytics in deep learning environment: a review

Arabic Rumor Detection Using Contextual Deep Bidirectional Language Modeling

Measure the Performance by Analysis of Different Boosting algorithms on Various Patterns of Phishing datasets

Contact Info

Product

Resources

About