Abstract:During the last decades, the reliance on email communication, especially in business, has increased significantly. Companies receive a massive amount of emails daily, that include business inquiries, customers’ feedback, and other types of emails. This inspired many researchers to propose different algorithms to classify and redistribute the numerous emails according to their content. Nowadays, emails containing Arabic text, especially in the Arab world, have raised an increasing concern since they became wide… Show more
This comprehensive review delves into the realm of email spam classification, scrutinizing the efficacy of various machine learning methods employed in the ongoing battle against unwanted email communication. The paper synthesizes a wide array of research findings, methodologies, and performance metrics to provide a holistic perspective on the evolving landscape of spam detection. Emphasizing the pivotal role of machine learning in addressing the dynamic nature of spam, the review explores the strengths and limitations of popular algorithms such as Naive Bayes, Support Vector Machines, and neural networks. Additionally, it examines feature engineering, dataset characteristics, and evolving threats, offering insights into the challenges and opportunities within the field. With a focus on recent advancements and emerging trends, this review aims to guide researchers, practitioners, and developers in the ongoing pursuit of robust and adaptive email spam classification systems.
This comprehensive review delves into the realm of email spam classification, scrutinizing the efficacy of various machine learning methods employed in the ongoing battle against unwanted email communication. The paper synthesizes a wide array of research findings, methodologies, and performance metrics to provide a holistic perspective on the evolving landscape of spam detection. Emphasizing the pivotal role of machine learning in addressing the dynamic nature of spam, the review explores the strengths and limitations of popular algorithms such as Naive Bayes, Support Vector Machines, and neural networks. Additionally, it examines feature engineering, dataset characteristics, and evolving threats, offering insights into the challenges and opportunities within the field. With a focus on recent advancements and emerging trends, this review aims to guide researchers, practitioners, and developers in the ongoing pursuit of robust and adaptive email spam classification systems.
In this paper, we present an innovative approach to enhancing email spam classification using N-gram features, TF-IDF weighting, SMOTE oversampling, and ensemble learning techniques such as Decision Trees, Random Forests, and Ensemble Extra Trees. Our methodology involves preprocessing the dataset to extract N-gram features, applying TF-IDF weighting to highlight important terms, and addressing class imbalance through SMOTE. We then train and evaluate multiple classification models and find that the Ensemble Extra Trees algorithm outperforms others in terms of accuracy, precision, recall, and F1-score. Our experiments on benchmark datasets confirm the efficacy of our approach, showcasing significant improvements in spam detection accuracy and highlighting the potential of ensemble learning for email spam classification. This research contributes to the advancement of spam filtering technologies, providing a robust and efficient solution for accurately identifying and categorizing spam emails.
The extraordinary success of deep learning is made possible due to the availability of crowd-sourced large-scale training datasets. Mostly, these datasets contain personal and confidential information, thus, have great potential of being misused, raising privacy concerns. Consequently, privacy-preserving deep learning has become a primary research interest nowadays. One of the prominent approaches adopted to prevent the leakage of sensitive information about the training data is by implementing differential privacy during training for their differentially private training, which aims to preserve the privacy of deep learning models. Though these models are claimed to be a safeguard against privacy attacks targeting sensitive information, however, least amount of work is found in the literature to practically evaluate their capability by performing a sophisticated attack model on them. Recently, DP-BCD is proposed as an alternative to state-of-the-art DP-SGD, to preserve the privacy of deep-learning models, having low privacy cost and fast convergence speed with highly accurate prediction results. To check its practical capability, in this article, we analytically evaluate the impact of a sophisticated privacy attack called the membership inference attack against it in both black box as well as white box settings. More precisely, we inspect how much information can be inferred from a differentially private deep model’s training data. We evaluate our experiments on benchmark datasets using AUC, attacker advantage, precision, recall, and F1-score performance metrics. The experimental results exhibit that DP-BCD keeps its promise to preserve privacy against strong adversaries while providing acceptable model utility compared to state-of-the-art techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.