Sarang Shaikh scite author profile

The increasing use of social media and information sharing has given major benefits to humanity. However, this has also given rise to a variety of challenges including the spreading and sharing of hate speech messages. Thus, to solve this emerging issue in social media sites, recent studies employed a variety of feature engineering techniques and machine learning algorithms to automatically detect the hate speech messages on different datasets. However, to the best of our knowledge, there is no study to compare the variety of feature engineering techniques and machine learning algorithms to evaluate which feature engineering technique and machine learning algorithm outperform on a standard publicly available dataset. Hence, the aim of this paper is to compare the performance of three feature engineering techniques and eight machine learning algorithms to evaluate their performance on a publicly available dataset having three distinct classes. The experimental results showed that the bigram features when used with the support vector machine algorithm best performed with 79% off overall accuracy. Our study holds practical implication and can be used as a baseline study in the area of detecting automatic hate speech messages. Moreover, the output of different comparisons will be used as state-of-art techniques to compare future researches for existing automated text classification techniques.

show abstract

Towards Improved Classification Accuracy on Highly Imbalanced Text Dataset Using Deep Neural Language Models

Shaikh

Daudpota

Imran

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

Data imbalance is a frequently occurring problem in classification tasks where the number of samples in one category exceeds the amount in others. Quite often, the minority class data is of great importance representing concepts of interest and is often challenging to obtain in real-life scenarios and applications. Imagine a customers’ dataset for bank loans-majority of the instances belong to non-defaulter class, only a small number of customers would be labeled as defaulters, however, the performance accuracy is more important on defaulters labels than non-defaulter in such highly imbalance datasets. Lack of enough data samples across all the class labels results in data imbalance causing poor classification performance while training the model. Synthetic data generation and oversampling techniques such as SMOTE, AdaSyn can address this issue for statistical data, yet such methods suffer from overfitting and substantial noise. While such techniques have proved useful for synthetic numerical and image data generation using GANs, the effectiveness of approaches proposed for textual data, which can retain grammatical structure, context, and semantic information, has yet to be evaluated. In this paper, we address this issue by assessing text sequence generation algorithms coupled with grammatical validation on domain-specific highly imbalanced datasets for text classification. We exploit recently proposed GPT-2 and LSTM-based text generation models to introduce balance in highly imbalanced text datasets. The experiments presented in this paper on three highly imbalanced datasets from different domains show that the performance of same deep neural network models improve up to 17% when datasets are balanced using generated text.

show abstract

Evaluating Polarity Trend Amidst the Coronavirus Crisis in Peoples’ Attitudes toward the Vaccination Drive

et al. 2021

View full text Add to dashboard Cite

It has been more than a year since the coronavirus (COVID-19) engulfed the whole world, disturbing the daily routine, bringing down the economies, and killing two million people across the globe at the time of writing. The pandemic brought the world together to a joint effort to find a cure and work toward developing a vaccine. Much to the anticipation, the first batch of vaccines started rolling out by the end of 2020, and many countries began the vaccination drive early on while others still waiting in anticipation for a successful trial. Social media, meanwhile, was bombarded with all sorts of both positive and negative stories of the development and the evolving coronavirus situation. Many people were looking forward to the vaccines, while others were cautious about the side-effects and the conspiracy theories resulting in mixed emotions. This study explores users’ tweets concerning the COVID-19 vaccine and the sentiments expressed on Twitter. It tries to evaluate the polarity trend and a shift since the start of the coronavirus to the vaccination drive across six countries. The findings suggest that people of neighboring countries have shown quite a similar attitude regarding the vaccination in contrast to their different reactions to the coronavirus outbreak.

show abstract

Bloom’s Learning Outcomes’ Automatic Classification Using LSTM and Pretrained Word Embeddings

2021

View full text Add to dashboard Cite

Bloom's taxonomy is a popular model to classify educational learning objectives into different learning levels for three domains including cognitive, affective and psycho motor. Each domain is further detailed into different levels. The cognitive domain includes knowledge, comprehension, application, analysis, synthesis and evaluation levels. In educational institutions, designing course learning outcomes (CLOs) as per different levels of Bloom and mapping of assessment items on designed CLOs is an important task -every semester, faculty and administrators read thousands of statements to complete the tedious task of such mapping of CLOs and assessment items into Bloom's levels for an improved student learning. This paper proposes LSTM based deep learning model to perform classification of CLOs and assessment items in different levels of Bloom in cognitive domain. Although, there has been some attempts in the literature to automatically assign Bloom's taxonomy category using keywords-based approach but it suffers from the problem of low accuracy and overlapping of keywords. Initially, when we performed keywords-based approach on our datasets we achieved the overall accuracy of 55% for classification of CLOs and assessment items into Bloom's taxonomy. The proposed model predicts Bloom's level for CLO and assessment question item, respectively. The proposed model is simple in terms of the architecture as compared to other deep learning models reported in literature and achieves classification accuracy of 87% and 74% on CLOs and assessment question items, respectively. The proposed model obtained 3% increase in overall accuracy comparing to an existing study for the same task. To the best of our knowledge, this is first attempt towards applying deep learning on classifying educational objectives in Bloom's levels.

show abstract

A Robust Framework for Object Detection in a Traffic Surveillance System

et al. 2022

View full text Add to dashboard Cite

Object recognition is the technique of specifying the location of various objects in images or videos. There exist numerous algorithms for the recognition of objects such as R-CNN, Fast R-CNN, Faster R-CNN, HOG, R-FCN, SSD, SSP-net, SVM, CNN, YOLO, etc., based on the techniques of machine learning and deep learning. Although these models have been employed for various types of object detection applications, however, tiny object detection faces the challenge of low precision. It is essential to develop a lightweight and robust model for object detection that can detect tiny objects with high precision. In this study, we suggest an enhanced YOLOv2 (You Only Look Once version 2) algorithm for object detection, i.e., vehicle detection and recognition in surveillance videos. We modified the base network of the YOLOv2 by reducing the number of parameters and replacing it with DenseNet. We employed the DenseNet-201 technique for feature extraction in our improved model that extracts the most representative features from the images. Moreover, our proposed model is more compact due to the dense architecture of the base network. We utilized DenseNet-201 as a base network due to the direct connection among all layers, which helps to extract a valuable information from the very first layer and pass it to the final layer. The dataset gathered from the Kaggle and KITTI was used for the training of the proposed model, and we cross-validated the performance using MS COCO and Pascal VOC datasets. To assess the efficacy of the proposed model, we utilized extensive experimentation, which demonstrates that our algorithm beats existing vehicle detection approaches, with an average precision of 97.51%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sarang Shaikh

Automatic Hate Speech Detection using Machine Learning: A Comparative Study

Towards Improved Classification Accuracy on Highly Imbalanced Text Dataset Using Deep Neural Language Models

Evaluating Polarity Trend Amidst the Coronavirus Crisis in Peoples’ Attitudes toward the Vaccination Drive

Bloom’s Learning Outcomes’ Automatic Classification Using LSTM and Pretrained Word Embeddings

A Robust Framework for Object Detection in a Traffic Surveillance System

Contact Info

Product

Resources

About