The aim of this paper is to review machine learning (ML) algorith ms and techniques for hate speech detection in social media (SM). Hate speech problem is normally model as a text classification task. In this study, we examined the basic baseline components of hate speech classification using ML algorithms. There are five basic baseline componentsdata collection and exploration, feature extraction, dimensionality reduction, classifier selection and training, and model evaluation, were reviewed. There have been improvements in ML algorithms that were employed for hate speech detection over time. New datasets and different performance metrics have been proposed in the literature. To keep the researchers informed regarding these trends in the automatic detection of hate speech, it calls for a comprehensive and an updated state-of-the-art. The contributions of this study are two-fold. First to equip the readers with the necessary information on the critical steps involved in hate speech detection using ML algorithms. Secondly, the weaknesses and strengths of each method is critically evaluated to guide researchers in the algorithm choice dilemma. The different variants of ML techniques were reviewed which include classical ML, ensemble approach and deep learning methods. Researchers and professionals alike will benefit immensely from this study.
Online derogatory comments are ubiquitous on social media and areraising serious concerns across the globe. Social media data is riddenwith high dimensional search space due to noise, redundant features,and non-standardized writing style. These problems lead to high computationalcosts, longer training time, and low predictive accuracy inmachine learning models. The researchers proposed a framework for optimizingstacked generalized ensemble learning to address these problemsand enhance model performance. The main components of the frameworkinclude feature optimizer (FO), ensemble classifiers, and stratifiedK-foldCV (skfCV) through stacked generalization ensemble architecture.The ensemble classifiers, FO, and skfCV components make our methodstable and computationally efficient with the best performance. Theproposed method was validated using three benchmark datasets. The proposed method outperformed the state-of-the-art results in all theevaluation metrics used in those three articles adopted for comparison.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.