A Discrete Hidden Markov Model for SMS Spam Detection

Xia, Tian; Chen, Xuemin

doi:10.3390/app10145011

Cited by 46 publications

(16 citation statements)

References 50 publications

(82 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Accuracy, precision, recall, and F-measure (F1) were applied as the evaluation indexes of the model in this paper. Accuracy is the score of sentiment Scientific Programming correctly predicted in all microblog comments [35], which is the percentage of examples that the classifier obtains from the total number of examples predicted by a given label. e precision is the fraction of relevant instances among all retrieved instances.…”

Section: Evaluation Indexmentioning

confidence: 99%

Chinese Microblog Sentiment Detection Based on CNN-BiGRU and Multihead Attention Mechanism

Qiuhong¹,

FanChongdi

YaoJie³

et al. 2020

Scientific Programming

View full text Add to dashboard Cite

With the rapid development of the Internet, Weibo has gradually become one of the commonly used social tools in society at present. We can express our opinions on Weibo anytime and anywhere. Weibo is widely used and people can express themselves freely on it; thus, the amount of comments on Weibo has become extremely large. In order to count up the attitudes of users towards a certain event, Weibo managers often need to evaluate the position of a certain microblog in an appropriate way. In traditional position detection tasks, researchers mainly mine text semantic features through constructing feature engineering and sentiment dictionary, but it takes a large amount of manpower in feature selection and design. However, it is an effective method to analyze the sentiment state of microblog comments. Deep learning is developing in an increasingly mature direction, and the utilization of deep learning methods for sentiment detection has become increasingly popular. The application of convolutional neural networks (CNN), bidirectional GRU (BiGRU), and multihead attention mechanism- (multihead attention-) combined method CNN-BiGRU-MAttention (CBMA) to conduct Chinese microblog sentiment detection was proposed in this paper. Firstly, CNN were applied to extract local features of text vectors. Afterward, BiGRU networks were applied to extract the global features of the text to solve the problem that the single CNN cannot obtain global semantic information and the disappearance of the traditional recurrent neural network (RNN) gradient. At last, it was concluded that the CBMA algorithm is more accurate for Chinese microblog sentiment detection through a variety of algorithm experiments.

show abstract

Section: Evaluation Indexmentioning

confidence: 99%

Chinese Microblog Sentiment Detection Based on CNN-BiGRU and Multihead Attention Mechanism

Qiuhong¹,

FanChongdi

YaoJie³

et al. 2020

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…Chen et al [8] designed a model with knowledge powered attention mechanisms for classifying short texts according to their semantics. A further example of short text analysis is spam identification in social media posts, emails, and short messaging services [9,10]. However, typical short texts are structured and segmented character strings such as "Jay and Jolin are born in Taiwan" in [8] while domain names are unstructured and unsegmented character strings such as incometaxindiaefiling.gov.in, teacherspayteachers.com, and adnet workperformance.com in the top one million sites provided by Alexa [11].…”

Section: Introductionmentioning

confidence: 99%

A Word-Level Analytical Approach for Identifying Malicious Domain Names Caused by Dictionary-Based DGA Malware

et al. 2021

View full text Add to dashboard Cite

Computer networks are facing serious threats from the emergence of malware with sophisticated DGAs (Domain Generation Algorithms). This type of DGA malware dynamically generates domain names by concatenating words from dictionaries for evading detection. In this paper, we propose an approach for identifying the callback communications of such dictionary-based DGA malware by analyzing their domain names at the word level. This approach is based on the following observations: These malware families use their own dictionaries and algorithms to generate domain names, and accordingly, the word usages of malware-generated domains are distinctly different from those of human-generated domains. Our evaluation indicates that the proposed approach is capable of achieving accuracy, recall, and precision as high as 0.9989, 0.9977, and 0.9869, respectively, when used with labeled datasets. We also clarify the functional differences between our approach and other published methods via qualitative comparisons. Taken together, these results suggest that malware-infected machines can be identified and removed from networks using DNS queries for detected malicious domain names as triggers. Our approach contributes to dramatically improving network security by providing a technique to address various types of malware encroachment.

show abstract

“…Other works employ traditional classifiers to such a task (Fernandes et al, 2015;Fattahi and Mejri, 2021;Xia and Chen, 2020), including diverse models like Support Vector Machines (SVM), Hidden Markov Models (HMM), Optimum-Path Forest (OPF), k-Nearest Neighbors (KNN), decision trees, and ensembling approaches. Gupta et al (2018) provide a comparative study using CNN and traditional machine learning architectures.…”

Section: Introductionmentioning

confidence: 99%

SMS Spam Detection Through Skip-gram Embeddings and Shallow Networks

Sousa¹,

Pedronette²,

Papa³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

The drastic decrease in mobile SMS costs turned phone users more prone to spam messages, usually with unwanted marketing or questionable content. As such, researchers have proposed different methods for detecting SMS spam messages. This paper presents a technique for embedding SMS messages into vector spaces that is suitable for spam detection. The proposed approach relies on mining patterns that are relevant for distinguishing spam from legitimate messages. A subset of those patterns is used to construct a function that maps text messages into a multidimensional vector space. The extracted patterns are represented as skip-grams of token attributes, where a skip-gram can be seen as a generalization of the n-gram model that allows a distance greater than one between matched tokens in the text. We evaluate the proposed approach using the generated vectors for spam classification on the UCI Spam Collection dataset. The experiments showed that our method combined with shallow networks reached accuracy that is competitive with state-of-the-art approaches.

show abstract

A Discrete Hidden Markov Model for SMS Spam Detection

Cited by 46 publications

References 50 publications

Chinese Microblog Sentiment Detection Based on CNN-BiGRU and Multihead Attention Mechanism

Chinese Microblog Sentiment Detection Based on CNN-BiGRU and Multihead Attention Mechanism

A Word-Level Analytical Approach for Identifying Malicious Domain Names Caused by Dictionary-Based DGA Malware

SMS Spam Detection Through Skip-gram Embeddings and Shallow Networks

Contact Info

Product

Resources

About