Text Coherence Analysis based on Misspelling Oblivious Word Embeddings and Deep Neural Network

Wadud, Md. Anwar Hussen; Rashadul, Md.

doi:10.14569/ijacsa.2021.0120124

Cited by 12 publications

(3 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Feature engineering of textual data is also known as vectorization, where words within at text document are encoded as binary numbers of numeric or floating-point vectors. In this study, Word2Vec [35], TF-IDF [36], and BERT [37] feature extraction methods were used on the textual datasets.…”

Section: Feature Engineeringmentioning

confidence: 99%

Deep-BERT: Transfer Learning for Classifying Multilingual Offensive Texts on Social Media

Wadud¹,

Mridha²,

Shin³

et al. 2023

Computer Systems Science and Engineering

View full text Add to dashboard Cite

Offensive messages on social media, have recently been frequently used to harass and criticize people. In recent studies, many promising algorithms have been developed to identify offensive texts. Most algorithms analyze text in a unidirectional manner, where a bidirectional method can maximize performance results and capture semantic and contextual information in sentences. In addition, there are many separate models for identifying offensive texts based on monolingual and multilingual, but there are a few models that can detect both monolingual and multilingual-based offensive texts. In this study, a detection system has been developed for both monolingual and multilingual offensive texts by combining deep convolutional neural network and bidirectional encoder representations from transformers (Deep-BERT) to identify offensive posts on social media that are used to harass others. This paper explores a variety of ways to deal with multilingualism, including collaborative multilingual and translation-based approaches. Then, the Deep-BERT is tested on the Bengali and English datasets, including the different bidirectional encoder representations from transformers (BERT) pre-trained word-embedding techniques, and found that the proposed Deep-BERT's efficacy outperformed all existing offensive text classification algorithms reaching an accuracy of 91.83%. The proposed model is a state-of-the-art model that can classify both monolingual-based and multilingual-based offensive texts.

show abstract

Section: Feature Engineeringmentioning

confidence: 99%

Deep-BERT: Transfer Learning for Classifying Multilingual Offensive Texts on Social Media

Wadud¹,

Mridha²,

Shin³

et al. 2023

Computer Systems Science and Engineering

View full text Add to dashboard Cite

show abstract

“…While CNNs and RNNs perform well with context-free features, textual content features with contextual information provide a better representation of words and yield better classification results. Recently, different language models have gained popularity in different NLP tasks [22][23][24]. A few studies have used embeddings from language model (ELMO), such as Bidirectional Encoder Representations from Transformers (BERT), that have outperformed several baseline methods in fake news detection [25,26].…”

Section: Introductionmentioning

confidence: 99%

AugFake-BERT: Handling Imbalance through Augmentation of Fake News Using BERT to Enhance the Performance of Fake News Classification

et al. 2022

Self Cite

View full text Add to dashboard Cite

Fake news detection techniques are a topic of interest due to the vast abundance of fake news data accessible via social media. The present fake news detection system performs satisfactorily on well-balanced data. However, when the dataset is biased, these models perform poorly. Additionally, manual labeling of fake news data is time-consuming, though we have enough fake news traversing the internet. Thus, we introduce a text augmentation technique with a Bidirectional Encoder Representation of Transformers (BERT) language model to generate an augmented dataset composed of synthetic fake data. The proposed approach overcomes the issue of minority class and performs the classification with the AugFake-BERT model (trained with an augmented dataset). The proposed strategy is evaluated with twelve different state-of-the-art models. The proposed model outperforms the existing models with an accuracy of 92.45%. Moreover, accuracy, precision, recall, and f1-score performance metrics are utilized to evaluate the proposed strategy and demonstrate that a balanced dataset significantly affects classification performance.

show abstract

“…This method is based on the study of latent semantic analysis (LSA), a method that compares units of textual information and determines their semantic relationship. In the following years, several coherence analysis methods were proposed by various researchers; however, no method has proved to be perfect [8].…”

Section: Introductionmentioning

confidence: 99%

Transformer based Model for Coherence Evaluation of Scientific Abstracts: Second Fine-tuned BERT

Gutierrez-Choque,

Medina-Mamani,

Castro-Gutierrez

et al. 2022

IJACSA

View full text Add to dashboard Cite

Coherence evaluation is a problem related to the area of natural language processing whose complexity lies mainly in the analysis of the semantics and context of the words in the text. Fortunately, the Bidirectional Encoder Representation from Transformers (BERT) architecture can capture the aforementioned variables and represent them as embeddings to perform Fine-tunings. The present study proposes a Second Fine-Tuned model based on BERT to detect inconsistent sentences (coherence evaluation) in scientific abstracts written in English/Spanish. For this purpose, 2 formal methods for the generation of inconsistent abstracts have been proposed: Random Manipulation (RM) and K-means Random Manipulation (KRM). Six experiments were performed; showing that performing Second Fine-Tuned improves the detection of inconsistent sentences with an accuracy of 71%. This happens even if the new retraining data are of different language or different domain. It was also shown that using several methods for generating inconsistent abstracts and mixing them when performing Second Fine-Tuned does not provide better results than using a single technique.

show abstract

Text Coherence Analysis based on Misspelling Oblivious Word Embeddings and Deep Neural Network

Cited by 12 publications

References 24 publications

Deep-BERT: Transfer Learning for Classifying Multilingual Offensive Texts on Social Media

Deep-BERT: Transfer Learning for Classifying Multilingual Offensive Texts on Social Media

AugFake-BERT: Handling Imbalance through Augmentation of Fake News Using BERT to Enhance the Performance of Fake News Classification

Transformer based Model for Coherence Evaluation of Scientific Abstracts: Second Fine-tuned BERT

Contact Info

Product

Resources

About