2016 IEEE Security and Privacy Workshops (SPW) 2016
DOI: 10.1109/spw.2016.39
|View full text |Cite
|
Sign up to set email alerts
|

Caradoc: A Pragmatic Approach to PDF Parsing and Validation

Abstract: PDF has become a de facto standard for exchanging electronic documents, for visualization as well as for printing. However, it has also become a common delivery channel for malware, and previous work has highlighted features that lead to security issues. In our work, we focus on the structure of the format, independently from specific features. By methodically testing PDF readers against hand-crafted files, we show that the interpretation of PDF files at the structural level may cause some form of denial of se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…Caradoc [22] is an exception to the above. Endignoux et al focus on weaknesses in the PDF standard related to document structure.…”
Section: Static Pdf Malware Detectionmentioning
confidence: 94%
See 1 more Smart Citation
“…Caradoc [22] is an exception to the above. Endignoux et al focus on weaknesses in the PDF standard related to document structure.…”
Section: Static Pdf Malware Detectionmentioning
confidence: 94%
“…In the following paragraph, we briefly introduce the PDF format, focusing only on elements that are relevant to code extraction. We refer the interested reader to [22] for an extensive description of the Portable Document Format (PDF). For code extraction purposes, the four most important elements of the PDF syntax are: (1) direct objects, which are the basic building blocks of a PDF; (2) indirect objects, which are uniquely identified, and can be referenced from elsewhere in the document; (3) cross-reference tables, which contain the positions of objects in the file; and (4) content streams, which store various parts of the document content.…”
Section: Pre-processing Step: Extractionmentioning
confidence: 99%
“…Prior work has developed tools to address these difficulties. For example, Nail [Bangert and Zeldovich 2014] is a tool to generate secure data parsers based on a specification, and Caradoc [Endignoux et al 2016] is a secure PDF parser and validator. BIEBER, in contrast to Nail, does not require the user to write a specification to build a secure parser, but rather infers a parser from the original application's executions and generates safe-by-construction code.…”
Section: Secure Parsersmentioning
confidence: 99%
“…This will also involve defining a safe PDF subset. Preliminary work has demonstrated that it is feasible to develop provably safe and correct parsers for relatively simple and well-understood subsets of practical formats, including PDF [9]. A subsetting tool will be essential in ensuring that such tools can be applied to parse PDF's as they exist in the wild.…”
Section: B Transforming Documents To Subsetsmentioning
confidence: 99%