A Machine Learning Approach to Dataset Imputation for Software Vulnerabilities

Rostami, Shahin; Kleszcz, Agnieszka; Dimanov, Daniel; Katos, Vasilios

doi:10.1007/978-3-030-59000-0_3

Cited by 9 publications

(3 citation statements)

References 12 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are very few works to date on the MITRE ATT&CK framework with respect to identifying attacks using machine learning. Reference [ 20 ] used two hidden layer feed-forward neural networks to impute missing values in a 2018/2019 ENISA dataset [ 21 ] based on ATT&CK descriptions of the intrusions. They found that this is a valid approach for filling out intrusion datasets with missing values.…”

Section: Related Workmentioning

confidence: 99%

Detecting Reconnaissance and Discovery Tactics from the MITRE ATT&CK Framework in Zeek Conn Logs Using Spark’s Machine Learning in the Big Data Framework

Bagui

Mink²,

Ghosh³

et al. 2022

Sensors

View full text Add to dashboard Cite

While computer networks and the massive amount of communication taking place on these networks grow, the amount of damage that can be done by network intrusions grows in tandem. The need is for an effective and scalable intrusion detection system (IDS) to address these potential damages that come with the growth of these networks. A great deal of contemporary research on near real-time IDS focuses on applying machine learning classifiers to labeled network intrusion datasets, but these datasets need be relevant pertaining to the currency of the network intrusions. This paper focuses on a newly created dataset, UWF-ZeekData22, that analyzes data from Zeek’s Connection Logs collected using Security Onion 2 network security monitor and labelled using the MITRE ATT&CK framework TTPs. Due to the volume of data, Spark, in the big data framework, was used to run many of the well-known classifiers (naïve Bayes, random forest, decision tree, support vector classifier, gradient boosted trees, and logistic regression) to classify the reconnaissance and discovery tactics from this dataset. In addition to looking at the performance of these classifiers using Spark, scalability and response time were also analyzed.

show abstract

Section: Related Workmentioning

confidence: 99%

Detecting Reconnaissance and Discovery Tactics from the MITRE ATT&CK Framework in Zeek Conn Logs Using Spark’s Machine Learning in the Big Data Framework

Bagui

Mink²,

Ghosh³

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…In healthcare, the presence of missing values can be challenging issues especially in supporting healthcare decision [16]. The controversy of imputation has been discussed since 1998, however, the evolution of imputation with machine learning rise after a while especially in healthcare industry domain [17]- [20]. As discussed, imputation of missing data can be discovered through statistical and machine learning, and both carry strength and limitations to deal with it.…”

Section: Related Workmentioning

confidence: 99%

Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare

Ismail

Abidin

Maen³

2022

Journal of Robotics and Control

View full text Add to dashboard Cite

Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has been widely adopted as an imputation for missing data due to its robustness and simplicity and it is also a promising method to outperform other machine learning methods. This paper provides a comprehensive review of different imputation techniques used to replace the missing data. The goal of the review paper is to bring specific attention to potential improvements to existing methods and provide readers with a better grasps of imputation technique trends.

show abstract

“…It should be noted that, as with all exercises of joining data from different sources, the end dataset may have sparse entries. Although there are approaches to increase the sample population of a given feature with a low number of data points (see for example ML based dataset imputation applied on the ENISA dataset [37]), this was not required for this study. A summary of the features of the resulting dataset used is presented in Table 1.…”

Section: Datasets -Limitations Of Researchmentioning

confidence: 99%

Vulnerability Exposure Driven Intelligence in Smart, Circular Cities

et al. 2022

View full text Add to dashboard Cite

In this paper we study the vulnerability management dimension in smart city initiatives. As many cities across the globe invest a considerable amount of effort, resources and budget to modernise their infrastructure by deploying a series of technologies such as 5G, Software Defined Networks and IoT, we conduct an empirical analysis of their current exposure to existing vulnerabilities. We use an updated vulnerability dataset which is further enriched by quantitative research data from independent studies evaluating the maturity and accomplishments of cities in their journey to become smart. We particularly focus on cities that aspire to implement a (data-driven) Circular Economy agenda which we consider to potentially yield the highest risk from a vulnerabilities exposure perspective. Findings show that although a smarter city is attributed with a higher vulnerability exposure, investments on technology and human capital moderate this exposure in a way that it can be reduced.

show abstract

A Machine Learning Approach to Dataset Imputation for Software Vulnerabilities

Cited by 9 publications

References 12 publications

Detecting Reconnaissance and Discovery Tactics from the MITRE ATT&CK Framework in Zeek Conn Logs Using Spark’s Machine Learning in the Big Data Framework

Detecting Reconnaissance and Discovery Tactics from the MITRE ATT&CK Framework in Zeek Conn Logs Using Spark’s Machine Learning in the Big Data Framework

Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare

Vulnerability Exposure Driven Intelligence in Smart, Circular Cities

Contact Info

Product

Resources

About