Feature modeling and cluster analysis of malicious Web traffic

Dimitrijevikj, Ana

doi:10.33915/etd.4708

Cited by 2 publications

(6 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For our previous work [7], [48], [50] and for the work presented here and in [2] a highinteraction honeypots were developed and deployed. These high-interaction honeypots run real services and real Web applications following the example of GenII honeypots used by the Honeynet Project [90].…”

Section: Honeypotmentioning

confidence: 99%

“…Since 2008 when our first honeypot system was deployed, we collected huge amounts of data. The data that we analyzed in our previous work [7], [48], [50], and for the work presented here and in [2] was for time periods where honeypots had minimal or no downtime. In total we managed to create four datasets from the observed malicious HTTP traffic from the advertised honeypots.…”

Section: Datasetsmentioning

confidence: 99%

“…For this and for our previous work [48] and [50], as well as for the work presented in [2] and [7] we collected Web server logs from the two most commonly used Web servers Microsoft IIS and Apache [56] running on the two most commonly used server operating systems i.e. Windows and Linux.…”

Section: List Of Equations Introductionmentioning

confidence: 99%

“…This work together with [2] and [7] is a part of larger effort aimed at Improving Web Quality through an Integrated Approach [12]. Over a period of several years our research group deployed several honeypots with different configurations to collect malicious traffic.…”

Section: List Of Equations Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Analysis and Classification of Current Trends in Malicious HTTP Traffic

Pantev¹

View full text Add to dashboard Cite

show abstract

Section: Honeypotmentioning

confidence: 99%

Section: Datasetsmentioning

confidence: 99%

Section: List Of Equations Introductionmentioning

confidence: 99%

Section: List Of Equations Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Analysis and Classification of Current Trends in Malicious HTTP Traffic

Pantev¹

View full text Add to dashboard Cite

show abstract

“…The popularity of these applications and their frequent exploitation motivated us to analyze attackers activities on Web systems running Web 2.0 applications. For this purpose, over a period of several years, our research group [18], [55], [48] developed and deployed three high-interaction honeypots, each consisting of a three-tier Web architecture (i.e. Web server, application server, and a database server).…”

Section: Chapter 1 Introductionmentioning

confidence: 99%

Classification of Malicious Web Traffic

Anastasovski¹

View full text Add to dashboard Cite

Classification of Malicious Web Traffic Attacks targeting Web system vulnerabilities have shown an increasing trend in the recent past. A contributing factor in this trend is the deployment of Web 2.0 technologies. Due to the ability of users to create their own content, Web 2.0 applications have become increasingly popular and in turn this has made them attractive targets for malicious attacks. Given these trends there is a need to better understand and classify malicious cyber activities. The work presented in this thesis is based on malicious data collected by three high-interaction honeypots, and organized in HTTP sessions, each characterized by 43 different features. The data were divided into multiple vulnerability scans and attack classes. Five batch supervised machine learning algorithms (J48, PART, Support Vector Machine SVM, Multi Layer Perceptron MLP and Naive Bayes Learner NB) and one stream semi-supervised algorithm (CSL-Stream) were used to study whether machine learning algorithms could be used to distinguish between vulnerability scans and attacks and also among eleven vulnerability scan and nine attack classes. The Information Gain feature selection method, and three other feature selection methods, were used to determine whether different attacks and vulnerability scans can be characterized by a small number of features (i.e., session characteristics). The results showed that supervised algorithms can be trained to distinguish among different classes of malicious traffic using only a small number of features. The stream semi-supervised algorithm was able to classify the partially labeled data almost as good as the completely labeled data. The classification of the data was dependent on the number of instances in each class, distinctive features for each class and amount of concept drift. The supervised algorithms, however, were better in classifying the completely labeled data. First, I would like to thank my committee chair and adviser, Dr. Katerina Goseva-Popstojanova, for her guidance, support and encouragement throughout my graduate studies. Also, I would like to thank Dr. Roy Nutter and Dr. Arun Ross for being my graduate committee members. I am grateful for the support and advice from all my graduate committee members and I am thankful for their collaboration. This work was funded in part by the National Science Foundation under the grants CNS-0447715 and CCF-0916284. I also want to thank and acknowledge Risto Pantev, Ana Dimitrijevik, Brandon S. Miller, Jonathan Lynch, David Krovich, and J. Alex Baker for their collaboration in the research project. In addition, I would like to thank Dr. Hai-Long Nguyen for sharing his CSL-Stream algorithm with me and his help. Finally, I want to express my deepest gratitude to my mother for the support and motivation she has given me throughout the years. I also want to thank my late father, may he rest in peace, for believing in me and always encouraging me to follow my dreams. i

show abstract

Feature modeling and cluster analysis of malicious Web traffic

Cited by 2 publications

References 37 publications

Analysis and Classification of Current Trends in Malicious HTTP Traffic

Analysis and Classification of Current Trends in Malicious HTTP Traffic

Classification of Malicious Web Traffic

Contact Info

Product

Resources

About