Malware Detection on Byte Streams of PDF Files Using Convolutional Neural Networks

Jeong, Young-Seob; Woo, Jiyoung; Kang, Ah Reum

doi:10.1155/2019/8485365

Cited by 28 publications

(18 citation statements)

References 23 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There have been few studies on malware detection that used byte streams for training deep learning models. Jeong et al [ 6 ] designed a CNN model for malware detection of PDF files, wherein the input length is assumed to be 1000 bytes. They extracted byte streams from the PDF files and directly fed them to the CNN model.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Malware Detection of Hangul Word Processor Files Using Spatial Pyramid Average Pooling

Jeong

Woo

Lee

et al. 2020

Sensors

Self Cite

View full text Add to dashboard Cite

Malware detection of non-executables has recently been drawing much attention because ordinary users are vulnerable to such malware. Hangul Word Processor (HWP) is software for editing non-executable text files and is widely used in South Korea. New malware for HWP files continues to appear because of the circumstances between South Korea and North Korea. There have been various studies to solve this problem, but most of them are limited because they require a large amount of effort to define features based on expert knowledge. In this study, we designed a convolutional neural network to detect malware within HWP files. Our proposed model takes a raw byte stream as input and predicts whether it contains malicious actions or not. To incorporate highly variable lengths of HWP byte streams, we propose a new padding method and a spatial pyramid average pooling layer. We experimentally demonstrate that our model is not only effective, but also efficient.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Some studies used deep learning models to extract meaningful features from byte streams to accurately detects malicious actions. These studies have a common limitation in that their models are not efficient [ 3 , 5 , 6 , 7 ]. That is, the models are often too complex, so they take a long time to analyze numerous suspicious files.…”

Section: Introductionmentioning

confidence: 99%

Malware Detection of Hangul Word Processor Files Using Spatial Pyramid Average Pooling

Jeong

Woo

Lee

et al. 2020

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…Inspired by the study, we design a CNN to capture arbitrary patterns of malicious actions by analyzing the byte sequences of HWP files. The biggest difference between [22] and this study is the different characteristics of target files (HWP files versus PDF files). It is important to understand the structure of target files even if we adopt deep learning models because better understanding of the target files will help to design a better structure for solving the problem.…”

Section: Neural Network For Malware Detectionmentioning

confidence: 82%

“…There have been only a few studies of static analysis to detect malicious actions of non-executables. Jeong et al [22] recently designed a shallow convolutional neural network (CNN) to analyze the byte sequences of PDF files. They assumed that there must be scattered patterns representing malicious actions of byte sequences, and chose to employ the CNN because the CNN is known to be effective in capturing local promising patterns.…”

Section: Neural Network For Malware Detectionmentioning

confidence: 99%

“…It is important to understand the structure of target files even if we adopt deep learning models because better understanding of the target files will help to design a better structure for solving the problem. For example, the proposed model in this paper has two FC layers whereas the model in [22] has one FC layer. This can be interpreted that the HWP files might contain more complicated relationships among promising local values.Indeed, the malicious actions of HWP files take various types (e.g., JavaScript, Visual Basic for Application (VBA), and Encapsulated PostScript (EPS)), whereas most of the malicious actions in PDF files use Java script.…”

Section: Neural Network For Malware Detectionmentioning

confidence: 99%

See 1 more Smart Citation

Malware Detection on Byte Streams of Hangul Word Processor Files

2019

Self Cite

View full text Add to dashboard Cite

While the exchange of data files or programs on the Internet grows exponentially, most users are vulnerable to infected files, especially to malicious non-executables. Due to the circumstances between South and North Korea, many malicious actions have recently been found in Hangul Word Processor (HWP) non-executable files because the HWP is widely used in schools, military facilities, and government institutions of South Korea. The HWP file usually has one or more byte streams that are often used for the malicious actions. Based on an assumption that infected byte streams have particular patterns, we design a convolutional neural network (CNN) to grasp such patterns. We conduct experiments on our prepared 534 HWP files, and demonstrate that the proposed CNN achieves the best performance compared to other machine learning models. As new malicious attacks keep emerging, we will keep collecting such HWP files and investigate better model structures.

show abstract

On the Possibility of Evasion Attacks with Macro Malware

Yamamoto

Mimura

2021

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Malware Detection on Byte Streams of PDF Files Using Convolutional Neural Networks

Cited by 28 publications

References 23 publications

Malware Detection of Hangul Word Processor Files Using Spatial Pyramid Average Pooling

Malware Detection of Hangul Word Processor Files Using Spatial Pyramid Average Pooling

Malware Detection on Byte Streams of Hangul Word Processor Files

On the Possibility of Evasion Attacks with Macro Malware

Contact Info

Product

Resources

About