2013
DOI: 10.1016/j.diin.2013.08.004
|View full text |Cite
|
Sign up to set email alerts
|

Approaches to the classification of high entropy file fragments

Abstract: In this paper we propose novel approaches to the problem of classifying high entropy file fragments. Although classification of file fragments is central to the science of Digital Forensics, high entropy types have been regarded as a problem. Roussev and Garfinkel (2009) argue that existing methods will not work on high entropy fragments because they have no discernible patterns to exploit. We propose two methods that do not rely on such patterns. The NIST statistical test suite is used to detect randomness in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 29 publications
(21 citation statements)
references
References 25 publications
(37 reference statements)
0
20
0
Order By: Relevance
“…This task is a specialization of the problem of file type classification, well-known in computer forensics. In this context, many statistical techniques have been proposed, usually leveraging differences in the distribution of byte frequency among different file types [32,33,34,35]. Forensics techniques usually aim to classify all executable files in the same class, thus are not applicable as-is to our problem.…”
Section: Related Workmentioning
confidence: 99%
“…This task is a specialization of the problem of file type classification, well-known in computer forensics. In this context, many statistical techniques have been proposed, usually leveraging differences in the distribution of byte frequency among different file types [32,33,34,35]. Forensics techniques usually aim to classify all executable files in the same class, thus are not applicable as-is to our problem.…”
Section: Related Workmentioning
confidence: 99%
“…One part of the data set contains approximately one million files collected using random searches of the .gov domain. It has been used by many ransomware researchers [14,27,37,44,46,47,48,50,51,52,55], supporting the claim that this data set is a well-known and respected source of test data. In 2017 Grajeda et al [37] reported that this was the most popular data set currently in use.…”
Section: Govdocs1mentioning
confidence: 90%
“…Actions performed to ensure that the data contained within the data set is of the highest quality are also described. The breadth and depth of the NapierOne data set should also remove the need for other researchers to complement the data set with additional files as has previously been the case [47,48,51].…”
Section: Discussionmentioning
confidence: 99%
“…Some data fragments are quite easy to identify their file types, such as fragments that belong to files having only unified data such as ASCII text. Conversely, there is no powerful method to identify fragments that are data portions of high entropy files [3,4]. High entropy files normally contain compressed data, such as zip archives.…”
Section: Introductionmentioning
confidence: 99%
“…File fragment identification is one of the most serious problems in the process of file carving applied for digital forensics [1][2][3]. Some data fragments are quite easy to identify their file types, such as fragments that belong to files having only unified data such as ASCII text.…”
Section: Introductionmentioning
confidence: 99%