Proceedings of the 2016 ACM Symposium on Document Engineering 2016
DOI: 10.1145/2960811.2967157
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing the Searchability of Page-Image PDF Documents Using an Aligned Hidden Layer from a Truth Text

Abstract: The search accuracy achieved in a PDF image-plus-hiddentext (PDF-IT) document depends upon the accuracy of the optical character recognition (OCR) process that produced the searchable hidden text layer. In many cases recognising words in a blurred area of a PDF page image may exceed the capabilities of an OCR engine.This paper describes a project to replace an inadequate hidden textual layer of a PDF-IT file with a more accurate hidden layer produced from a 'truth text'. The alignment of the truth text with th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 8 publications
0
0
0
Order By: Relevance