Indonesian language email spam detection using N-gram and Naïve Bayes algorithm

Vernanda, Yustinus; Hansun, Seng; Kristanda, Marcel Bonar

doi:10.11591/eei.v9i5.2444

Cited by 7 publications

(8 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, the proposed three layers Bi-LSTM networks could achieve similar prediction performance results compared to other machine learning and deep learning methods that used deeper network's architecture, as reported in Aryal et al [32] who used LSTM, convolutional neural networks (CNN), and temporal convolution networks (TCN); Qi et al [33] who used RNN, LSTM, Bi-LSTM, and gated recurrent unit (GRU); and Dautel et al [34] who employed FNN, RNN, LSTM, and GRU. Moreover, we could also try to compare the prediction results from this study with other Machine and Deep Learning methods commonly used in the literature, such as naïve Bayes [35], GRU [36], and Random Forest Regressor [37], a popular tree-based algorithm.…”

Section: Performance Resultsmentioning

confidence: 99%

On searching the best mode for forex forecasting: bidirectional long short-term memory default mode is not enough

Hansun

Putri

Hugeng

2022

IJ-AI

Self Cite

View full text Add to dashboard Cite

Presently, the Forex market has become the world’s largest financial market with more than US$5 trillion daily volume. Therefore, it attracts many researchers to learn its traded currency pairs characteristics and predict their future values. Here, we propose simple three layers Bidirectional long short-term memory (Bi-LSTM) networks for Forex forecasting with four different merge modes. Moreover, the proposed model is also compared to the conventional long short-term memory (LSTM) networks with the same architecture. Five major Forex currency pairs, namely AUD/USD, EUR/USD, GBP/USD, USD/CHF, and USD/JPY, with more than ten years of historical records are considered in this study. It is revealed from the experimental results that among four available merge modes, the concatenation mode as the default merge mode in Bi-LSTM networks is actually the less preferred mode for Forex forecasting (Root mean square error 0.30685517, mean absolute error 0.27442235, mean absolute percentage error 0.827108%). Moreover, Bi-LSTM average mode gets the highest R2 score that could achieve 89.579%. Therefore, the proposed three layers Bi-LSTM networks could provide a baseline result for developing a good trading strategy in Forex forecasting.

show abstract

Section: Performance Resultsmentioning

confidence: 99%

On searching the best mode for forex forecasting: bidirectional long short-term memory default mode is not enough

Hansun

Putri

Hugeng

2022

IJ-AI

Self Cite

View full text Add to dashboard Cite

show abstract

“…Each message was manually labeled as being relevant or irrelevant. In the experiment, evaluation of the classification of user behavior was analyzed with 10-fold cross-validation using 6 different types of classifiers [1] that is naïve bayes [15,[22][23][24], decision trees [15,24], random forest [15,25], k-nearest neighbors [26], support vector machine (SVM) [27][28], and artificial neural network (ANN) [29]. The result of this classification was analyzed to find suitable features for identifying spam during live streaming.…”

Section: Classification Methods and Resultsmentioning

confidence: 99%

Analysis of spammers’ behavior on a live streaming chat

Yousukkee

Wisitpongphan

2021

IJ-AI

View full text Add to dashboard Cite

<span id="docs-internal-guid-f908fd2e-7fff-1849-4fda-c2cf9baed97e"><span>Live streaming is becoming a popular channel for advertising and marketing. An advertising company can use this feature to broadcast and reach a large number of customers. YouTube is one of the streaming media with an extreme growth rate and a large number of viewers. Thus, it has become a primary target of spammers and attackers. Understanding the behavior of users on live chat may reduce the moderator’s time in identifying and preventing spammers from disturbing other users. In this paper, we analyzed YouTube live streaming comments in order to understand spammers’ behavior. Seven user’s behavior features and message characteristic features were comprehensively analyzed. According to our findings, features that performed best in terms of run time and classification efficiency is the relevant score together with the time spent in live chat and the number of messages per user. The accuracy is as high as 66.22 percent. In addition, the most suitable technique for real-time classification is a decision tree.</span></span>

show abstract

“…A confusion matrix is a table that containing information about the comparison of the model results from the classification trials carried out to the actual classification results. The calculated values are accuracy, precision, recall or specificity, and F1-score [20]. Then, from the values of precision, recall, and F1-score obtained, the average value of each precision, recall, and F1-score for all classes will be calculated as the 'macro' average value to differentiate them from the 'micro' average value of precision, recall, and F1-score for each available class.…”

Section: Performance Metricsmentioning

confidence: 99%

Candlestick Pattern Classification Using Feedforward Neural Network

Karmelia¹,

Widjaja²,

Hansun³

2022

ijasca

Self Cite

View full text Add to dashboard Cite

Investment in the capital market can help boost a country’s economic growth. Without a doubt, in investing, a technical analysis of the condition of the stock is needed at that time. One of the technical analyses that can be done is to look at the historical data of stocks. Candlestick charts can summarize historical data that contain price value for Open, High, Low, and Close (OHLC) in the form of a chart. A group of candlesticks will form a pattern that can help investors to see whether the stock is trending up or down. The number of candlestick patterns and the manual determination of candlestick patterns may take time and effort. Feedforward Neural Network (FNN) is one of the algorithms that can help map the input and output of a given dataset. This study aims to implement FNN to classify candlestick patterns found in historical stock data. The test results show that the accuracy for each model scenario does not guarantee whether all patterns can be properly recognized. This is mainly caused by an imbalanced dataset and the classification process cannot be done properly. Testing with the original data has an accuracy of above 85% on each stock, but the average F1-score is below 45%. Further experiments using random under-sampling and Synthetic Minority Oversampling Technique (SMOTE) result in decreased accuracy value, where the lowest is 59% in PT Bukit Asam Tbk share, and an increased average F1-score, but less than 15%. Keywords: Candlestick patterns, feedforward neural network, investment, historical data, OHLC, SMOTE, stocks.

show abstract

Indonesian language email spam detection using N-gram and Naïve Bayes algorithm

Cited by 7 publications

References 18 publications

On searching the best mode for forex forecasting: bidirectional long short-term memory default mode is not enough

On searching the best mode for forex forecasting: bidirectional long short-term memory default mode is not enough

Analysis of spammers’ behavior on a live streaming chat

Candlestick Pattern Classification Using Feedforward Neural Network

Contact Info

Product

Resources

About