2020
DOI: 10.11591/eei.v9i5.2444
|View full text |Cite
|
Sign up to set email alerts
|

Indonesian language email spam detection using N-gram and Naïve Bayes algorithm

Abstract: Indonesia is ranked the top 8th out of the total country population in the world for the global spammers. Web-based spam filter service with the REST API type can be used to detect email spam in the Indonesian language on the email server or various types of email server applications. With REST API, then there will be data exchange between the applications with JSON data type using existing HTTP commands. One type of spam filter commonly used is Bayesian Filtering, where the Naïve Bayes algorithm is used as a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…Moreover, the proposed three layers Bi-LSTM networks could achieve similar prediction performance results compared to other machine learning and deep learning methods that used deeper network's architecture, as reported in Aryal et al [32] who used LSTM, convolutional neural networks (CNN), and temporal convolution networks (TCN); Qi et al [33] who used RNN, LSTM, Bi-LSTM, and gated recurrent unit (GRU); and Dautel et al [34] who employed FNN, RNN, LSTM, and GRU. Moreover, we could also try to compare the prediction results from this study with other Machine and Deep Learning methods commonly used in the literature, such as naïve Bayes [35], GRU [36], and Random Forest Regressor [37], a popular tree-based algorithm.…”
Section: Performance Resultsmentioning
confidence: 99%
“…Moreover, the proposed three layers Bi-LSTM networks could achieve similar prediction performance results compared to other machine learning and deep learning methods that used deeper network's architecture, as reported in Aryal et al [32] who used LSTM, convolutional neural networks (CNN), and temporal convolution networks (TCN); Qi et al [33] who used RNN, LSTM, Bi-LSTM, and gated recurrent unit (GRU); and Dautel et al [34] who employed FNN, RNN, LSTM, and GRU. Moreover, we could also try to compare the prediction results from this study with other Machine and Deep Learning methods commonly used in the literature, such as naïve Bayes [35], GRU [36], and Random Forest Regressor [37], a popular tree-based algorithm.…”
Section: Performance Resultsmentioning
confidence: 99%
“…Each message was manually labeled as being relevant or irrelevant. In the experiment, evaluation of the classification of user behavior was analyzed with 10-fold cross-validation using 6 different types of classifiers [1] that is naïve bayes [15,[22][23][24], decision trees [15,24], random forest [15,25], k-nearest neighbors [26], support vector machine (SVM) [27][28], and artificial neural network (ANN) [29]. The result of this classification was analyzed to find suitable features for identifying spam during live streaming.…”
Section: Classification Methods and Resultsmentioning
confidence: 99%
“…A confusion matrix is a table that containing information about the comparison of the model results from the classification trials carried out to the actual classification results. The calculated values are accuracy, precision, recall or specificity, and F1-score [20]. Then, from the values of precision, recall, and F1-score obtained, the average value of each precision, recall, and F1-score for all classes will be calculated as the 'macro' average value to differentiate them from the 'micro' average value of precision, recall, and F1-score for each available class.…”
Section: Performance Metricsmentioning
confidence: 99%