Classification of Malicious Web Traffic Attacks targeting Web system vulnerabilities have shown an increasing trend in the recent past. A contributing factor in this trend is the deployment of Web 2.0 technologies. Due to the ability of users to create their own content, Web 2.0 applications have become increasingly popular and in turn this has made them attractive targets for malicious attacks. Given these trends there is a need to better understand and classify malicious cyber activities. The work presented in this thesis is based on malicious data collected by three high-interaction honeypots, and organized in HTTP sessions, each characterized by 43 different features. The data were divided into multiple vulnerability scans and attack classes. Five batch supervised machine learning algorithms (J48, PART, Support Vector Machine SVM, Multi Layer Perceptron MLP and Naive Bayes Learner NB) and one stream semi-supervised algorithm (CSL-Stream) were used to study whether machine learning algorithms could be used to distinguish between vulnerability scans and attacks and also among eleven vulnerability scan and nine attack classes. The Information Gain feature selection method, and three other feature selection methods, were used to determine whether different attacks and vulnerability scans can be characterized by a small number of features (i.e., session characteristics). The results showed that supervised algorithms can be trained to distinguish among different classes of malicious traffic using only a small number of features. The stream semi-supervised algorithm was able to classify the partially labeled data almost as good as the completely labeled data. The classification of the data was dependent on the number of instances in each class, distinctive features for each class and amount of concept drift. The supervised algorithms, however, were better in classifying the completely labeled data. First, I would like to thank my committee chair and adviser, Dr. Katerina Goseva-Popstojanova, for her guidance, support and encouragement throughout my graduate studies. Also, I would like to thank Dr. Roy Nutter and Dr. Arun Ross for being my graduate committee members. I am grateful for the support and advice from all my graduate committee members and I am thankful for their collaboration. This work was funded in part by the National Science Foundation under the grants CNS-0447715 and CCF-0916284. I also want to thank and acknowledge Risto Pantev, Ana Dimitrijevik, Brandon S. Miller, Jonathan Lynch, David Krovich, and J. Alex Baker for their collaboration in the research project. In addition, I would like to thank Dr. Hai-Long Nguyen for sharing his CSL-Stream algorithm with me and his help. Finally, I want to express my deepest gratitude to my mother for the support and motivation she has given me throughout the years. I also want to thank my late father, may he rest in peace, for believing in me and always encouraging me to follow my dreams. i