2012
DOI: 10.1016/j.diin.2012.05.008
|View full text |Cite
|
Sign up to set email alerts
|

Using NLP techniques for file fragment classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
50
0
7

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 79 publications
(59 citation statements)
references
References 4 publications
0
50
0
7
Order By: Relevance
“…Due to compression algorithms, statistical properties of data cannot be used to classify the deflate-encoded data from different file formats. This fact is the reason why previous approaches that exploit statistical properties of compressed data as feature vectors brought low accurate rate [9,18]. Even from the empirical approach of [1], the authors took the advantage of compression properties such as Huffman table size, the detection rate is still low.…”
Section: Proposed Methodsmentioning
confidence: 95%
See 2 more Smart Citations
“…Due to compression algorithms, statistical properties of data cannot be used to classify the deflate-encoded data from different file formats. This fact is the reason why previous approaches that exploit statistical properties of compressed data as feature vectors brought low accurate rate [9,18]. Even from the empirical approach of [1], the authors took the advantage of compression properties such as Huffman table size, the detection rate is still low.…”
Section: Proposed Methodsmentioning
confidence: 95%
“…Therefore, they have been included in the data set for many research works. Most of current approaches provide low identification rates to file fragments of compound files which are less than 30 % as reported in [8,9]. Other works bring better identification rates but with much smaller size or small number of file types as considered in [10,11].…”
Section: Introductionmentioning
confidence: 91%
See 1 more Smart Citation
“…The statistical patterns denote quantitative features such as mean, variance and frequency, whereas the structural patterns denote morphological features such as syntactic grammar and interrelationship [21]. Recently, many researches of format classification involve statistical approaches as in [15][16][17][18]. Because of the rapid growth of the capacity of multimedia, many formats utilize compression methods to reduce the cost and lead to generate high entropy data.…”
Section: Format Feature Extractionmentioning
confidence: 99%
“…SVM is a powerful machine learning method because it is not limited by number of samples and dimensionality [11][12][13][14]. In [15][16][17][18], researchers used statistical features, such as mean, standard deviation, byte frequency distribution, Shannon entropy, N-gram and Hamming weight to classify the formats. Because of strong compression and entropy coding of audio file, it is hard to achieve high accuracy of classification only with statistical features.…”
Section: Introductionmentioning
confidence: 99%