How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models

Malik, Muhammad Shahid Iqbal; Imran, Tahir; Mamdouh, Jamjoom Mona

doi:10.7717/peerj-cs.1248

Cited by 12 publications

(6 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The BERT model was introduced by Devlin et al (2018) at Google Lab and it has proven its significance for a variety of text-mining tasks in several application domains ( Malik, Imran & Mamdouh, 2023 ). The benefits of BERT include faster development, automated feature generation, reduced data requirements, and improved performance.…”

Section: Framework Methodologymentioning

confidence: 99%

“…TF-IDF is a statistical approach to evaluate the significance of a particular word in a large context of the document. This technique is commonly used in NLP and information retrieval (IR) tasks ( Malik, Imran & Mamdouh, 2023 ). It is a weighting technique and the weight of a word in a document is proportional to its frequency of occurrence whereas it is also inversely proportional to its frequency in all documents.…”

Section: Framework Methodologymentioning

confidence: 99%

“…To capture the actual context of the language used to describe human and infrastructural damages in tweets, a state-of-the-art language model is required. Fine-tuning the BERT model has demonstrated robust performance in similar natural language processing (NLP) tasks ( Malik, Cheema & Ignatov, 2023 ; Malik, Imran & Mamdouh, 2023 ). Therefore, we are interested in utilizing the architecture of the BERT language model with fine-tuning.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Categorization of tweets for damages: infrastructure and human damage assessment using fine-tuned BERT model

Malik,

Younas,

Jamjoom

et al. 2024

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

Identification of infrastructure and human damage assessment tweets is beneficial to disaster management organizations as well as victims during a disaster. Most of the prior works focused on the detection of informative/situational tweets, and infrastructure damage, only one focused on human damage. This study presents a novel approach for detecting damage assessment tweets involving infrastructure and human damages. We investigated the potential of the Bidirectional Encoder Representations from Transformer (BERT) model to learn universal contextualized representations targeting to demonstrate its effectiveness for binary and multi-class classification of disaster damage assessment tweets. The objective is to exploit a pre-trained BERT as a transfer learning mechanism after fine-tuning important hyper-parameters on the CrisisMMD dataset containing seven disasters. The effectiveness of fine-tuned BERT is compared with five benchmarks and nine comparable models by conducting exhaustive experiments. The findings show that the fine-tuned BERT outperformed all benchmarks and comparable models and achieved state-of-the-art performance by demonstrating up to 95.12% macro-f1-score, and 88% macro-f1-score for binary and multi-class classification. Specifically, the improvement in the classification of human damage is promising.

show abstract

Section: Framework Methodologymentioning

confidence: 99%

Section: Framework Methodologymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Categorization of tweets for damages: infrastructure and human damage assessment using fine-tuned BERT model

Malik,

Younas,

Jamjoom

et al. 2024

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…First, traditional NLP models and commonly used LLMs in CSS research often lack reasoning capabilities [116]. For instance, LLMs like BERT-based models, which are extensively used in HCI research that analyze large volumes of social media data [27,60], are typically fine-tuned for specific discrete downstream tasks (e.g., classification). While these pretrained language models have shown promise in performing discrete analyses, some emerging HCI research [116,117] demonstrate the additional value of prompting LLMs to perform multi-step reasoning for a more comprehensive analysis.…”

Section: Harnessing Large Language Models In Computational Social Sci...mentioning

confidence: 99%

Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language

Ding,

Carik,

Gunturi

et al. 2024

Proceedings of the CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

We introduce a multi-step reasoning framework using promptbased LLMs to examine the relationship between social media language patterns and trends in national health outcomes. Grounded in fuzzy-trace theory, which emphasizes the importance of "gists" of causal coherence in effective health communication, we introduce Role-Based Incremental Coaching (RBIC), a prompt-based LLM framework, to identify gists at-scale. Using RBIC, we systematically extract gists from subreddit discussions opposing COVID-19 health measures (Study 1). We then track how these gists evolve across Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

show abstract

“…Word2vec word embedding model has shown state-ofthe-art performance in many classification tasks related to the NLP domain [36][37][38]. There are two methods supported by word2vec to generate word embeddings; skip-gram and CBOW.…”

Section: Word2vecmentioning

confidence: 99%

Fine-Tuning Transformer Models Using Transfer Learning for Multilingual Threatening Text Identification

Rehan,

Malik,

Jamjoom

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

Threatening content detection on social media has recently gained attention. There is very limited work regarding threatening content detection in low-resource languages, especially in Urdu. Furthermore, previous work explored only mono-lingual approaches, and multi-lingual threatening content detection was not studied. This research addressed the task of Multi-lingual Threatening Content Detection (MTCD) in Urdu and English languages by exploiting transfer learning methodology with fine-tuning techniques. To address the multi-lingual task, we investigated two methodologies: 1) Joint multi-lingual, and 2) Joint-translated method. The former approach employs the concept of building a universal classifier for different languages whereas the latter approach applies the translation process to transform the text into one language and then perform classification. We explore the Multilingual Representations for Indian Languages (MuRIL) and Robustly Optimized BERT Pre-Training Approach (RoBERTa) with fine-tuning that already demonstrated state-of-the-art in capturing the contextual and semantic characteristics within the text. For hyper-parameters, manual search and grid search strategies are utilized to find the optimum values. Various experiments are performed on bi-lingual English and Urdu datasets and findings revealed that the proposed methodology outperformed the baselines and showed benchmark performance. The RoBERTa model achieved the highest performance by demonstrating 92% accuracy and 90% macro f1-score with the joint multi-lingual approach.

show abstract

How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models

Cited by 12 publications

References 33 publications

Categorization of tweets for damages: infrastructure and human damage assessment using fine-tuned BERT model

Categorization of tweets for damages: infrastructure and human damage assessment using fine-tuned BERT model

Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language

Fine-Tuning Transformer Models Using Transfer Learning for Multilingual Threatening Text Identification

Contact Info

Product

Resources

About