Cyberbullying identification in twitter using support vector machine and information gain based feature selection

Purnamasari, Ni Made Gita Dwi; Fauzi, Muhammad Ali; Indriati, Indriati; Dewi, Liana Shinta

doi:10.11591/ijeecs.v18.i3.pp1494-1500

Cited by 20 publications

(16 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Machine learning (ML) based approaches with different feature selection methods are widely used in cyberbullying tweet classification. Purnamasari et al [26] utilized the SVM and Information Gain(IG) based feature selection method for detecting cyberbullying events in tweets. Muneer & Fati [11] used various classifiers, namely AdaBoost(ADB), Light Gradient Boosting Machine (LGBM), SVM, RF, Stochastic Gradient Descent (SGD), Logistic Regression (LR), and MNB, and for cyberbullying events identification in tweets.…”

Section: Related Workmentioning

confidence: 99%

“…The input dataset and the data annotation are described in sections III-A and III-B. Two baseline cyberbullying models based on deep learning, namely Bi-LSTM [21], RNN [21], and three baseline cyberbullying models based on machine learning models, namely, SVM [26], Multinomial Naive Bayes (MNB) [11], and RF [11] are used for the comparison with the proposed DEA-RNN model. These models have been selected from state-of-the-art cyberbullying detection in social media.…”

Section: Ivexperimental Analysismentioning

confidence: 99%

See 1 more Smart Citation

DEA-RNN: A Hybrid Deep Learning Approach for Cyberbullying Detection in Twitter Social Media Platform

et al. 2022

View full text Add to dashboard Cite

Cyberbullying (CB) has become increasingly prevalent in social media platforms. With the popularity and widespread use of social media by individuals of all ages, it is vital to make social media platforms safer from cyberbullying. This paper presents a hybrid deep learning model, called DEA-RNN, to detect CB on Twitter social media network. The proposed DEA-RNN model combines Elman type Recurrent Neural Networks (RNN) with an optimized Dolphin Echolocation Algorithm (DEA) for finetuning the Elman RNN's parameters and reducing training time. We evaluated DEA-RNN thoroughly utilizing a dataset of 10000 tweets and compared its performance to those of state-of-the-art algorithms such as Bi-directional long short term memory (Bi-LSTM), RNN, SVM, Multinomial Naive Bayes (MNB), Random Forests (RF). The experimental results show that DEA-RNN was found to be superior in all the scenarios. It outperformed the considered existing approaches in detecting CB on Twitter platform. DEA-RNN was more efficient in scenario 3, where it has achieved an average of 90.45% accuracy, 89.52% precision, 88.98% recall, 89.25% F1-score, and 90.94% specificity.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Ivexperimental Analysismentioning

confidence: 99%

DEA-RNN: A Hybrid Deep Learning Approach for Cyberbullying Detection in Twitter Social Media Platform

et al. 2022

View full text Add to dashboard Cite

show abstract

“…It is evident that in many cases [25] similar to this, random forest is a good model. Though SVM has been proven to work better in some special cases, such as fault classification in smart distribution network [26], ozone prediction [27], cyberbullying identification [28], harmonic source identification [29], etc. In Table 2, we can see that CNN did good.…”

Section: Fig 3 Comparative Visualization Of Accelerometer and Gyroscope Data Points In Different Activitiesmentioning

confidence: 99%

Leveraging Sensor Fusion and Sensor-Body Position for Activity Recognition for Wearable Mobile Technologies

Алам

Das

Tasjid

et al. 2021

Int. J. Interact. Mob. Technol.

View full text Add to dashboard Cite

Smart devices like smartphones and smartwatches have made this world smarter. These wearable devices are created through complex research methodologies to make them more usable and interactive with its user. Various interactive mobile applications such as augmented reality (AR), virtual reality (VR) or mixed reality (MR) applications solely depend on the in-built sensors of the smart devices. A lot of facilities can be taken from these devices with sensors such as accelerometer and gyroscope. Different physical activities such as walking, jogging, sitting, etc., can be important for analysis like health state prediction and duration of exercise by using those sensors based on artificial intelligence. In this paper, we have implemented machine learning and deep learning algorithms to detect and recognize eight activities namely, walking, jogging, standing, walking upstairs, walking downstairs, sitting, sitting-in-a-car and cycling; with a maximum of 99.3% accuracy. A few activities are almost similar in action, such as sitting and sitting-in-a-car, but difficult to distinguish; which makes it more challenging to predict tasks. In this paper, we have hypothesized that with more sensors (sensor fusion) and data collection points (sensor-body positions) a wide range of activities can be recognized and the recognition accuracies can be increased. Finally, we showed that the combination of all the sensors data of both pocket/waist and wrist can be used to recognize a wide range of activities accurately. The possibility of using the proposed methodologies for futuristic mobile technologies is quite significant. The adaptation of most recent deep learning algorithms such as convolutional neural network (CNN) and bi-directional Long Short Time Memory (Bi-LSTM) demonstrated high credibility of the methods presented as experimentation.

show abstract

“…The input variables that maximize the information gain are selected which in turn minimizes the entropy and best splits the dataset into groups for efficient classification. Information gain is very effectively used in various researches for Twitter sentiment classification also [29]- [30] gain is biased for the input feature with higher number of distinct values. The (1) gives the formula for IG calculation as given under [31]:…”

Section: Information Gainmentioning

confidence: 99%

Analyzing impact of number of features on efficiency of hybrid model of lexicon and stack based ensemble classifier for twitter sentiment analysis using WEKA tool

Rani¹,

Gill²,

Gulia³

2021

IJEECS

View full text Add to dashboard Cite

Twitter is used by millions of people across the world, so the data collected from Twitter can be highly valuable for research and helpful in decision support. Here in this paper ‘Twitter US Airline data’ from Kaggle data repository is used for sentiment classification of customers’ reviews. The current research aims to implement various machine learning classifiers, Stack-based ensemble classifiers and hybrid of lexicon classifier with other classifiers. 11 different classification models are implemented for different sized feature sets. Also, all the 11 models are re-implemented by adding sentiment score of lexicon based classifier as one of the features in the feature set. Results are analyzed by varying number of input feature variables used in the classification. Four different size feature sets having 301,501, 701, and 1301 number of features are used to analyze the variations in the final findings. Chi-Square and Information gain techniques are used for feature selection. The results show that an increase in the number of features increases the accuracy up to 701 features. After that, accuracy is stable or decreases with increase in feature set size. Also, the cost of adding sentiment score of lexicon classifier to the input feature set is nominal, but the results are improved consistently. WEKA and R Studio tools are used for analysis and implementation. Accuracy and Kappa are used for representing and comparing the efficiency of models.

show abstract

Cyberbullying identification in twitter using support vector machine and information gain based feature selection

Cited by 20 publications

References 16 publications

DEA-RNN: A Hybrid Deep Learning Approach for Cyberbullying Detection in Twitter Social Media Platform

DEA-RNN: A Hybrid Deep Learning Approach for Cyberbullying Detection in Twitter Social Media Platform

Leveraging Sensor Fusion and Sensor-Body Position for Activity Recognition for Wearable Mobile Technologies

Analyzing impact of number of features on efficiency of hybrid model of lexicon and stack based ensemble classifier for twitter sentiment analysis using WEKA tool

Contact Info

Product

Resources

About