2020
DOI: 10.1007/978-3-030-59000-0_3
|View full text |Cite
|
Sign up to set email alerts
|

A Machine Learning Approach to Dataset Imputation for Software Vulnerabilities

Abstract: This paper proposes a supervised machine learning approach for the imputation of missing categorical values from the majority of samples in a dataset. Twelve models have been designed that are able to predict nine of the twelve ATT&CK tactic categories using only one feature, namely the Common Attack Pattern Enumeration and Classification (CAPEC). The proposed method has been evaluated on a 867 sample unseen test set with classification accuracy in the range of 99.88%-100%. Using these models, a more complete … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 12 publications
(8 reference statements)
0
3
0
Order By: Relevance
“…There are very few works to date on the MITRE ATT&CK framework with respect to identifying attacks using machine learning. Reference [ 20 ] used two hidden layer feed-forward neural networks to impute missing values in a 2018/2019 ENISA dataset [ 21 ] based on ATT&CK descriptions of the intrusions. They found that this is a valid approach for filling out intrusion datasets with missing values.…”
Section: Related Workmentioning
confidence: 99%
“…There are very few works to date on the MITRE ATT&CK framework with respect to identifying attacks using machine learning. Reference [ 20 ] used two hidden layer feed-forward neural networks to impute missing values in a 2018/2019 ENISA dataset [ 21 ] based on ATT&CK descriptions of the intrusions. They found that this is a valid approach for filling out intrusion datasets with missing values.…”
Section: Related Workmentioning
confidence: 99%
“…In healthcare, the presence of missing values can be challenging issues especially in supporting healthcare decision [16]. The controversy of imputation has been discussed since 1998, however, the evolution of imputation with machine learning rise after a while especially in healthcare industry domain [17]- [20]. As discussed, imputation of missing data can be discovered through statistical and machine learning, and both carry strength and limitations to deal with it.…”
Section: Related Workmentioning
confidence: 99%
“…It should be noted that, as with all exercises of joining data from different sources, the end dataset may have sparse entries. Although there are approaches to increase the sample population of a given feature with a low number of data points (see for example ML based dataset imputation applied on the ENISA dataset [37]), this was not required for this study. A summary of the features of the resulting dataset used is presented in Table 1.…”
Section: Datasets -Limitations Of Researchmentioning
confidence: 99%