Malicious Domain Detection Based on K-means and SMOTE

Wang, Qing; Li, Linyu; Jiang, Bo; Lü, Zhigang; Liu, Junrong; Jian, Shijie

doi:10.1007/978-3-030-50417-5_35

Cited by 17 publications

(7 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Trend pola jaringan yang diakses pengguna internet serta menghasilkan profil trafik jaringan pada volume lalu lintas data yang tinggi menghasilkan tiga klaster berdasarkan penggunaan trafik, yaitu tinggi, sedang dan rendah sesuai dengan protokol layanan dan IP Address yang berbeda [10]. Dalam mendeteksi domain berbahaya dengan proses klasterisasi terhadap sejumlah besar data trafik DNS dalam menemukan domain berbahaya juga tepat dengan menggunakan metode K-Mean [11]. Penelitian yang memanfaatkan DNS log dalam membuat klasterisasi penggunaan trafik internet sangat diperlukan dalam menjaga kelancaran trafik [12], [13] maka penelitian ini bertujuan untuk mengklasterisasi penggunaan trafik internet menggunakan K-Mean Clustering.…”

Section: Pendahuluanunclassified

Klasterisasi Penggunaan Trafik Internet Menggunakan K-Mean Clustering

Yasriady

2022

jsisfotek

View full text Add to dashboard Cite

Penggunaan trafik internet dalam suatu kantor pemerintah perlu diawasi secara cermat untuk memperoleh efisiensi pemakaiannya secara baik dan tepat guna. Jalur internet yang telah disediakan merupakan fasilitas resmi dibiayai dari anggaran yang bersumber rakyat sehingga perlu diawasi secara cermat. Domain Name System (DNS) menyediakan data yang kaya dan menarik, serta dapat diekstraksi untuk mengungkap informasi yang bisa dianalisis bagi berbagai keperluan seperti tindakan keamanan, mengukur tingkat penggunaan trafik, pembatasan bandwidth, user profiling hingga kebijakan lain yang diterapkan dalam suatu jaringan. Penelitian ini bertujuan membuat klasterisasi terhadap penggunaan trafik internet sehingga memberikan manfaat yang dapat digunakan untuk meningkatkan layanan jaringan (QoS), melakukan efisiensi terhadap pemakaian bandwidth serta membuat profile pengguna. Penelitian ini dilakukan berdasarkan DNS Log yang dioperasikan pada suatu jaringan yang terhubung ke internet. Pada penelitian ini diperlihatkan bagaimana melakukan konsolidasi trafik pada port 53/udp guna mengumpulkan DNS log, sehingga dengan cara ini aktivitas pengguna internet dapat dicatat dalam sebuah server terpusat hingga akhirnya digunakan sebagai sumber data primer. Datasets yang digunakan merupakan ekstraksi informasi berasal dari log file DNS Server (dnsmasq) yang diambil selama 5 hari kerja dalam periode jam kerja. Total datasets hasil ekstraksi yang digunakan adalah sebanyak 213 records. Data-data yang tersedia selanjutnya diolah untuk mendapatkan target klaster dengan memanfaatkan konsep data mining menggunakan metode K-Mean Clustering. Analisis dan pengolahan data pada penelitian ini dilakukan secara manual pada aplikasi Microsoft Excel menggunakan metode K-Mean, penelitian ini berhasil mengelompokkan penggunaan trafik internet menjadi 3 klaster yaitu tinggi, sedang dan rendah. Masing-masing klaster terdiri dari Kla1 = 23, Kla2 = 3, Kla3 =160.

show abstract

Section: Pendahuluanunclassified

Klasterisasi Penggunaan Trafik Internet Menggunakan K-Mean Clustering

Yasriady

2022

jsisfotek

View full text Add to dashboard Cite

show abstract

“…This model extracts static lexical features and dynamic DNS resolving features to profile every DN from the DNS traffic data. In [28], the authors also address the imbalance problem and present a KMSMOTE method that uses SMOTE and K-Means clustering algorithm. The system uses assumptions such as malicious DNs leave their traces on DNS traffic, malicious DNs have lower DN registration cost, and reuse network resources.…”

Section: Related Workmentioning

confidence: 99%

“…Focus on groundtruth labeling [24] --Heuristic [30], [47] 54K Based on intuition [25] --Heuristic [30], [41] 10M Slows down n/w [26] --Heuristic [30], [31], [44] Based on intuition [27] -DI EEA, [30], [39], 10K Focus only on data HAC EEA [46], [48] imbalance [28] -DI CatBoost, SVM [30], [43], [49], 16K Oversampling GBDT, XGBoost [34], [41], [50] [29] APT EI ELM, LR, SVM, [30], [41], 40K Only for CART, BPNN [34], [42] targeted attacks Ours Eth --K-Means, 11 [33], [49], [51], 335M -ML Algos [44], [52], using TPOT [42], [53] • B/C Blockchain, Eth Ethereum Blockchain data, • Features: N DN String based, Q DNS Query based, G DNS Graph based, T Temporal aspect based, O Other, particular feature not used, • Detection Of: AG Algorithmic Generated Names, BT Botnet, F F Fast-Flux, AP T Advance Persistent Threats, − no specific mention but targets DNs in general, • Tackles: R Reputation, L Ground Truth Labeling, DI Data Imbalance, EI Efficiency Improvement, − no specific mention, • ML Algo: GB Gradient Boosting, SV M Support Vector Machine, RF Random Forest, KN N K-Nearest Neighbors, N B Naive Bayes, BC Bayesian Classifier, LBS Logit-Boost Strategy, RC Random Committee, EEA EasyEnsemble Algorithm, ELM Extreme Learning Machine, GBDT Gradient Boosting Decision Tree, XGBoost eXtreme Gradient Boosting, LR Logistic Regression, BP N N Back Propagation Neural Networks, • Dataset Size: no mention, • Issues: DGA : Domain Generation Algorithm, RDN S Recursive DN System, n/w : network, IP : IP address ground truth information about the DNs is extracted from [33], [42],…”

Section: A Data Collection and Pre-processingmentioning

confidence: 99%

Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties

Sachan,

Agarwal,

Shukla

2021

Preprint

View full text Add to dashboard Cite

The rise in the adoption of blockchain technology has led to increased illegal activities by cyber-criminals costing billions of dollars. Many machine learning algorithms are applied to detect such illegal behavior. These algorithms are often trained on the transaction behavior and, in some cases, trained on the vulnerabilities that exist in the system. In our approach, we study the feasibility of using metadata such as Domain Name (DN) associated with the account in the blockchain and identify whether an account should be tagged malicious or not. Here, we leverage the temporal aspects attached to the DNs. Our results identify 144930 DNs that show malicious behavior, and out of these, 54114 DNs show persistent malicious behavior over time. Nonetheless, none of these identified malicious DNs were reported in new officially tagged malicious blockchain DNs.

show abstract

“…In addition, some researchers identify the characteristics of malicious domain names by analyzing DNS traffic data. Researchers use many detection methods; for example, K-means algorithm and smote method are combined [ 6 ], convolutional neural network structure and cyclic neural network are combined to detect malicious domain names involved in botnets [ 7 ], and RBF kernel is added to support vector machine algorithm to improve the detection effect of malicious domain names [ 8 ].…”

Section: Introductionmentioning

confidence: 99%

A Malicious Domain Detection Model Based on Improved Deep Learning

Huang

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

With the rapid development of the Internet, malicious domain names pose more and more serious threats to many fields, such as network security and social security, and there have been many research results on malicious domain detection. This article proposes a malicious domain name detection model based on improved deep learning, which can combine the advantages of three different network models, convolutional neural network (CNN), temporal convolutional network (TCN), and long short-term memory network (LSTM) in malicious domain name detection, to obtain a better detection effect than that of the original single or two models. Experiments show that the effect of the improved deep learning model proposed in this article is better than that of the combined model of CNN and LSTM or the combined model of CNN and TCN, and the accuracy and regression rates reached 99.76% and 98.81%, respectively.

show abstract

Malicious Domain Detection Based on K-means and SMOTE

Cited by 17 publications

References 14 publications

Klasterisasi Penggunaan Trafik Internet Menggunakan K-Mean Clustering

Klasterisasi Penggunaan Trafik Internet Menggunakan K-Mean Clustering

Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties

A Malicious Domain Detection Model Based on Improved Deep Learning

Contact Info

Product

Resources

About