A Spam Transformer Model for SMS Spam Detection

Liu, Xiaoxu; Lu, Haoye; Nayak, Amiya

doi:10.1109/access.2021.3081479

Cited by 61 publications

(36 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Language Transformer models have been playing a central role in text processing and text analysis in the realm of Natural Language Processing in recent times due to their massive potential for robust text embedding. An optimized Transformer based model for detecting SMS spam messages has been proposed by Xiaoxu et al [17] and evaluated the proposed model on benchmarking datasets. Sergio et al [21] look into whether language models that are sensitive to the semantics and context of words, such as Google's BERT, can be used to resist this adversarial attack.…”

Section: Literature Reviewmentioning

confidence: 99%

Hybrid CNN-GRU Framework with Integrated Pre-trained Language Transformer for SMS Phishing Detection

Ulfath¹,

Alqahtani²,

Hammoudeh³

et al. 2022

Preprint

View full text Add to dashboard Cite

Smartphones are prone to SMS phishing due to the rapid growth in the availability of smart mobile technologies driven by Internet connections. Also, detecting phishing SMS is a challenging task due to the unstructured nature of SMS text data with non-linear complex correlations. In this concern, considering the recent advancements in the domain of cybersecurity, we have proposed a hybrid deep learning framework that extracts robust features from SMS texts followed by an automatic detection of Phishing SMS. Due to combining the potential capability of individual models into one hybrid framework, it has outperformed various other individual machine learning and deep learning models. The proposed Phishing Detection framework is an effective hybrid combination of pretrained transformer model, MPNet (Masked and Permuted Language Modeling), with supervised ConvNets (CNN) and Bi-directional Gated Recurrent Units (GRU). It is intended to successfully detect unstructured short phishing text messages that contain complex patterns.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Hybrid CNN-GRU Framework with Integrated Pre-trained Language Transformer for SMS Phishing Detection

Ulfath¹,

Alqahtani²,

Hammoudeh³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Where 𝜎 is a sigmoid function, 𝑊 and 𝑏 is the weight and bias of each gate which will continue to be updated during the training process, 𝑐 is a vector in the cell state section, ℎ 𝑡−1 is a hidden state in the previous unit and ⊙ is an operator for element-wise multiplication [1,25].…”

Section: Long Short-term Memorymentioning

confidence: 99%

“…Short message service (SMS) is a communication service in text format that has been used by humans in the last few decades and has become an embedded feature on every cellphone, be it a featured phone or smartphone. Since it is a service that has advantages such as low cost and eases to use, this service is also used by certain parties to send an unwanted text message, namely, spam message [1,2]. Spam is a type of message that is sent arbitrarily with various purposes such as promotions/advertising, borrowing money, announcements of sweepstakes, and such so that they are disturbing to mobile phone users [3], [4].…”

Section: Introductionmentioning

confidence: 99%

Android-Based Short Message Service Filtering using Long Short-Term Memory Classification Model

Mustagfirin

Wiriasto

Suksmadana

et al. 2022

khif

View full text Add to dashboard Cite

Short Message Service (SMS) is a technology for sending messages in text format between two mobile phones that support such a facility. Despite the emergence of many mobile text messaging applications, SMS still finds its use in communication among people and broadcasting messages by governments and mobile providers. SMS users often receive messages from parties, particularly for marketing and business purposes, advertisements, or elements of fraud. Many of those messages are irrelevant and fraudulent spam. This research aims at developing android-based applications that enable the filtering of SMS in Bahasa Indonesia. We investigate 1469 SMS text data and classify them into three categories: Normal, Fraudulent, and Advertisement. The classification or filtering method is the long short-term memory (LSTM) model from TensorFlow. The LSTM model is suitable because it has cell states in the architecture that are useful for storing previous information. The feature is applicable for use on sequential data such as SMS texts because every word in the texts constructs a sequential form to complete a sentence. The observation results show that the classification accuracy level is 95%. This model is then integrated into an Android-based mobile application to execute a real-time classification.

show abstract

“…Quite recently released Twitter dataset distinguished more than five ways of twitter spams, including, but not limited to, profanity, insulting, hate speech, malicious links, fraudulent reviews [1]. Similarly, recent research efforts considered similar spamming approaches against other online social networks and short message service (SMS) [2], [3]. It is not surprising that twitter reviews spam policy periodically [4].…”

Section: Introductionmentioning

confidence: 99%

“…Earlier models utilized straightforward classification and categorization algorithms such as Support Vector MAchine (SVM), Naïve Bayes (NB), K-Nearest Neighbor (K-NN), and Decision Trees (DT) [3], [5], [6]. More advanced solutions explore opportunities of improvement as a result of utilizing deep learning (DL) techniques [2], [9].…”

Section: Introductionmentioning

confidence: 99%

Modified Genetic Algorithm for Feature Selection and Hyper Parameter Optimization: Case of XGBoost in Spam Prediction

2022

View full text Add to dashboard Cite

Recently, spam on online social networks has attracted attention in the research and business world. Twitter has become the preferred medium to spread spam content. Many research efforts attempted to encounter social networks spam. Twitter brought extra challenges represented by the feature space size, and imbalanced data distributions. Usually, the related research works focus on part of these main challenges or produce black-box models. In this paper, we propose a modified genetic algorithm for simultaneous dimensionality reduction and hyper parameter optimization over imbalanced datasets. The algorithm initialized an eXtreme Gradient Boosting classifier and reduced the features space of tweets dataset; to generate a spam prediction model. The model is validated using a 50 times repeated 10-fold stratified cross-validation, and analyzed using nonparametric statistical tests. The resulted prediction model attains on average 82.32% and 92.67% in terms of geometric mean and accuracy respectively, utilizing less than 10% of the total feature space. The empirical results show that the modified genetic algorithm outperforms Chi 2 and P CA feature selection methods. In addition, eXtreme Gradient Boosting outperforms many machine learning algorithms, including BERT-based deep learning model, in spam prediction. Furthermore, the proposed approach is applied to SMS spam modeling and compared to related works.

show abstract

A Spam Transformer Model for SMS Spam Detection

Cited by 61 publications

References 27 publications

Hybrid CNN-GRU Framework with Integrated Pre-trained Language Transformer for SMS Phishing Detection

Hybrid CNN-GRU Framework with Integrated Pre-trained Language Transformer for SMS Phishing Detection

Android-Based Short Message Service Filtering using Long Short-Term Memory Classification Model

Modified Genetic Algorithm for Feature Selection and Hyper Parameter Optimization: Case of XGBoost in Spam Prediction

Contact Info

Product

Resources

About