2021 IEEE 5th International Conference on Cryptography, Security and Privacy (CSP) 2021
DOI: 10.1109/csp51677.2021.9357595
|View full text |Cite
|
Sign up to set email alerts
|

SpaML: a Bimodal Ensemble Learning Spam Detector based on NLP Techniques

Abstract: In this paper, we put forward a new tool, called SpaML, for spam detection using a set of supervised and unsupervised classifiers, and two techniques imbued with Natural Language Processing (NLP), namely Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). We first present the NLP techniques used. Then, we present our classifiers and their performance on each of these techniques. Then, we present our overall Ensemble Learning classifier and the strategy we are using to combine them. Final… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 20 publications
(17 reference statements)
0
10
0
Order By: Relevance
“…Ensemble learning Ensembling different models has previously been found useful for text classification (Nozza et al, 2016;Kanakaraj and Guddeti, 2015;Fattahi and Mejri, 2021). Accordingly, ensembling was one of the most common strategies for improving on baseline PCL detection methods.…”
Section: Resultsmentioning
confidence: 99%
“…Ensemble learning Ensembling different models has previously been found useful for text classification (Nozza et al, 2016;Kanakaraj and Guddeti, 2015;Fattahi and Mejri, 2021). Accordingly, ensembling was one of the most common strategies for improving on baseline PCL detection methods.…”
Section: Resultsmentioning
confidence: 99%
“…Nagwani and Sharaff proposed the use of ML algorithms such as Naïve Bayes (NB), support vector machine (SVM), non-negative matrix factorization, and latent Dirichlet allocation to identify spam [40], while Almeida et al suggested text normalization [41]. Fattahi and Mejri applied natural language processing (NLP) techniques, namely, Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) to identify spam SMSs [42]. Choudhary and Jain applied random forest (RF) classification algorithms [43].…”
Section: Methodsmentioning
confidence: 99%
“…The Fattahi & Mejri (2020) examined the Bag of Words (BoW) and TF-IDF spam detection algorithms using text data containing 747 spam message instances. They used a variety of machine learning approaches to classify spam and were able to achieve an accuracy of 97.99% and precision of 98.97%.…”
Section: Feature-extraction Techniquesmentioning
confidence: 99%