Proceedings of the 2005 ACM Symposium on Document Engineering 2005
DOI: 10.1145/1096601.1096650
|View full text |Cite
|
Sign up to set email alerts
|

Document digitization lifecycle for complex magazine collection

Abstract: The conversion of large collections of documents from paper to digital formats that are suitable for electronic archival is a complex multi-phase process. The creation of good quality images from paper documents is just one phase. To extract relevant information that they contain, with an accuracy that fits the purpose of target applications, an automated document analysis system and a manual verification/review process are needed. The automated system needs to perform a variety of analysis and recognition tas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2005
2005
2018
2018

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 18 publications
(11 reference statements)
0
6
0
Order By: Relevance
“…We have considered it from a practical and intuitive perspective as the case of a collections edited by one organization with a stable formatting policy. Should the latter evolve over time, as reported in [2], where the considered magazine went through several different presentational eras, we can expect the phenomenon to be detected as a QA issue and to be rapidly dealt with thank to the running window.…”
Section: Resultsmentioning
confidence: 95%
See 1 more Smart Citation
“…We have considered it from a practical and intuitive perspective as the case of a collections edited by one organization with a stable formatting policy. Should the latter evolve over time, as reported in [2], where the considered magazine went through several different presentational eras, we can expect the phenomenon to be detected as a QA issue and to be rapidly dealt with thank to the running window.…”
Section: Resultsmentioning
confidence: 95%
“…In practice, another aspect of the problem involves putting in place some automated Quality Assessment (QA) method in order to ensure a certain quality of the production [1] [2]. Automation is generally required due to the quantity of documents to process, unless manual labor is economically acceptable.…”
Section: Introductionmentioning
confidence: 99%
“…This method seems to be efficient, but as the previous one, only the segmentation in text blocks is provided, but no logical reading order, what is important to determine the logical structure of articles. In [5] a full document digitization lifecycle for complex magazine collection is presented. The proposed workflow which provides according to the authors, all the tools and systems needed for the conversion of a large collection of complex documents, gives promising results on a database of the Time magazine covering 80 years.…”
Section: Related Workmentioning
confidence: 99%
“…In this paper we propose a complete solution similar to the one proposed in [5] in some aspects, but adapted to this type of documents, and able to face the variety of layouts over the ages.…”
Section: Related Workmentioning
confidence: 99%
“…As raised by [4], the key issue is to select pieces of knowledge which are generic but also accurate enough to guarantee a robust and accurate system. But even for specific collections, formal variations occur over time due to new or modified document models as illustrated by [17], [26]. Robustness appears to be a real issue and publications are starting to focus on this problem [1], [2], [13].…”
Section: Introductionmentioning
confidence: 99%