2019
DOI: 10.1155/2019/8485365
|View full text |Cite
|
Sign up to set email alerts
|

Malware Detection on Byte Streams of PDF Files Using Convolutional Neural Networks

Abstract: With increasing amount of data, the threat of malware keeps growing recently. The malicious actions embedded in nonexecutable documents especially (e.g., PDF files) can be more dangerous, because it is difficult to detect and most users are not aware of such type of malicious attacks. In this paper, we design a convolutional neural network to tackle the malware detection on the PDF files. We collect malicious and benign PDF files and manually label the byte sequences within the files. We intensively examine th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 28 publications
(18 citation statements)
references
References 23 publications
(27 reference statements)
0
18
0
Order By: Relevance
“…There have been few studies on malware detection that used byte streams for training deep learning models. Jeong et al [ 6 ] designed a CNN model for malware detection of PDF files, wherein the input length is assumed to be 1000 bytes. They extracted byte streams from the PDF files and directly fed them to the CNN model.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There have been few studies on malware detection that used byte streams for training deep learning models. Jeong et al [ 6 ] designed a CNN model for malware detection of PDF files, wherein the input length is assumed to be 1000 bytes. They extracted byte streams from the PDF files and directly fed them to the CNN model.…”
Section: Related Workmentioning
confidence: 99%
“…Some studies used deep learning models to extract meaningful features from byte streams to accurately detects malicious actions. These studies have a common limitation in that their models are not efficient [ 3 , 5 , 6 , 7 ]. That is, the models are often too complex, so they take a long time to analyze numerous suspicious files.…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by the study, we design a CNN to capture arbitrary patterns of malicious actions by analyzing the byte sequences of HWP files. The biggest difference between [22] and this study is the different characteristics of target files (HWP files versus PDF files). It is important to understand the structure of target files even if we adopt deep learning models because better understanding of the target files will help to design a better structure for solving the problem.…”
Section: Neural Network For Malware Detectionmentioning
confidence: 82%
“…There have been only a few studies of static analysis to detect malicious actions of non-executables. Jeong et al [22] recently designed a shallow convolutional neural network (CNN) to analyze the byte sequences of PDF files. They assumed that there must be scattered patterns representing malicious actions of byte sequences, and chose to employ the CNN because the CNN is known to be effective in capturing local promising patterns.…”
Section: Neural Network For Malware Detectionmentioning
confidence: 99%
See 1 more Smart Citation