Performance analysis of Word Embeddings for Cyberbullying Detection

Pericherla, Subbaraju; Ilavarasan, E.

doi:10.1088/1757-899x/1085/1/012008

Cited by 15 publications

(10 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There was consistent outperformance by SBERT when the embeddings were used as the input across the classifiers, which was expected since it worked on the semantic textual similarity (STS) benchmark. The performance of static word embedding (i.e., Word2Vec, GloVe, fast-Text) was not as optimal as the contextual embeddings from the language models (i.e, RoBerta, XLNet, Albert) when coupled with classifiers for cyberbullying detection [87].…”

Section: F Features Used In Automated Cyberbullying Detectionmentioning

confidence: 98%

Cyberbullying Detection in Social Networks: A Comparison Between Machine Learning and Transfer Learning Approaches

Teng

Varathan

2023

IEEE Access

View full text Add to dashboard Cite

Information and Communication Technologies fueled social networking and facilitated communication. However, cyberbullying on the platform had detrimental ramifications. The user-dependent mechanisms like reporting, blocking, and removing bullying posts online is manual and ineffective. Bagof-words text representation without metadata limited cyberbullying post text classification. This research developed an automatic system for cyberbullying detection with two approaches: Conventional Machine Learning and Transfer Learning. This research adopted AMiCA data encompassing significant amount of cyberbullying context and structured annotation process. Textual, sentiment and emotional, static and contextual word embeddings, psycholinguistics, term lists, and toxicity features were used in the conventional Machine Learning approach. This study was the first to use toxicity features to detect cyberbullying. This study is also the first to use the latest psycholinguistics features from the Linguistic Inquiry and Word (LIWC) 2022 tool, as well as Empath's lexicon, to detect cyberbullying. The contextual embeddings of ggeluBert, tnBert, and DistilBert have alike performance, however DistilBert embeddings were elected for higher F-measure. Textual features, DistilBert embeddings, and toxicity features that struck new benchmark were the top three unique features when fed individually. The model's performance was boosted to F-measure of 64.8% after feeding with a combination of textual, sentiment, DistilBert embeddings, psycholinguistics, and toxicity features to the Logistic Regression model that outperforms Linear SVC with faster training time and efficient handling of high-dimensionality features. Transfer Learning approach was by fine-tuning optimized version Pre-trained Language Models namely, DistilBert, DistilRoBerta, and Electra-small which were found to have speedier training computation than their base form. The fine-tuned DistilBert resulted with the highest F-measure of 72.42%, surpassing CML. Our research concluded that Transfer Learning was the best for uplifted performance and lesser effort as feature engineering and resampling was omitted.

show abstract

Section: F Features Used In Automated Cyberbullying Detectionmentioning

confidence: 98%

Cyberbullying Detection in Social Networks: A Comparison Between Machine Learning and Transfer Learning Approaches

Teng

Varathan

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…An important factor in classifying texts, according to the machine learning models is to digitize them [23]. To train any machine learning classifiers, the input data needs to be in numerical format [24]. By applying various feature extraction techniques, every text information needs to be converted into a numerical representation.…”

Section: Feature Extractionmentioning

confidence: 99%

Cyberbullying Detection using Ensemble Method

Puthenveedu¹

View full text Add to dashboard Cite

show abstract

“…Feature extraction is one of the crucial parts in machine learning modelling because the performance of classifier depends on features used during the classification process [63]. The current study chose four types of features in the category of content-based features only for a comparison of the model performance evaluation before we proceed and integrate another feature category (e.g.…”

Section: Feature Extractionmentioning

confidence: 99%

“…In NLP tasks, machine learning algorithms depend on word embedding. Thus, we also made word embedding one of the features to convert the text for each message in the data set to numeric form [63]. We used pre-trained Word2vec…”

Section: Word Embeddingmentioning

confidence: 99%

Implementation of Hyperparameter Optimisation and Over-Sampling in Detecting Cyberbullying Using Machine Learning Approach

Ali

Mohd

Fauzi

et al. 2021

MJCS

View full text Add to dashboard Cite

Online social networks have become a necessity to everyone around the world. Particularly, online social networks have enabled us to connect to one another regardless of time, for as long as we have social media and social networking as platforms for broadcasting information and communicating, respectively. However, this evolution has resulted in people possibly committing various cybercrimes, such as cyberbullying. To address this issue, machine learning can be utilised to counter cyberbullying in online social networks. Thus, this study proposed a framework with a set of features consisting of word and character term frequency–inverse document frequency and word embedding by using Word2vec and six types of list terms: profane words, proper nouns, negation words, ‘allness’ term, diminisher words and intensifier words. These features were divided into four groups before being fed into the linear support vector classifier to train our model using ASKfm as data set in hyperparameter tuning and over-sampling environment. Results indicated that the proposed framework provided significant outcomes, in which the highest percentage of area under curve is 99.24% and F-measure is 97.38% as performed by our trained model.

show abstract

Performance analysis of Word Embeddings for Cyberbullying Detection

Cited by 15 publications

References 6 publications

Cyberbullying Detection in Social Networks: A Comparison Between Machine Learning and Transfer Learning Approaches

Cyberbullying Detection in Social Networks: A Comparison Between Machine Learning and Transfer Learning Approaches

Cyberbullying Detection using Ensemble Method

Implementation of Hyperparameter Optimisation and Over-Sampling in Detecting Cyberbullying Using Machine Learning Approach

Contact Info

Product

Resources

About