2022
DOI: 10.48550/arxiv.2206.03265
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Marvolo: Programmatic Data Augmentation for Practical ML-Driven Malware Detection

Abstract: Data augmentation has been rare in the cyber security domain due to technical difficulties in altering data in a manner that is semantically consistent with the original data. This shortfall is particularly onerous given the unique difficulty of acquiring benign and malicious training data that runs into copyright restrictions, and that institutions like banks and governments receive targeted malware that will never exist in large quantities. We present MARVOLO, a binary mutator that programmatically grows mal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…These approaches can be used to increase the accuracy of the classification model by about 2% to 20% with newly generated data samples. Also, it has been verified through several experiments that data augmentation has a positive effect on malware analysis and detection research [11,13,21,26,36]. On the other hand, there is a line of research work that uses the excellent performance of computer vision research to visualize malware as an image to augment data and detect malware [11,13,21].…”
Section: Malware Data Augmentationmentioning
confidence: 93%
See 2 more Smart Citations
“…These approaches can be used to increase the accuracy of the classification model by about 2% to 20% with newly generated data samples. Also, it has been verified through several experiments that data augmentation has a positive effect on malware analysis and detection research [11,13,21,26,36]. On the other hand, there is a line of research work that uses the excellent performance of computer vision research to visualize malware as an image to augment data and detect malware [11,13,21].…”
Section: Malware Data Augmentationmentioning
confidence: 93%
“…Also, since the feature used in their study is an image, Burks et al concluded that GAN is more efficient for data augmentation than VAE. Wong et al [36] proposed Marvolo using semantics-preserving transformations to augment labeled datasets. However, due to Marvolo's efficiency-oriented optimization, the accuracy is improved only for a limited number of binaries.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation