2020
DOI: 10.1108/ajim-11-2019-0326
|View full text |Cite
|
Sign up to set email alerts
|

Optimisation of archival processes involving digitisation of typewritten documents

Abstract: PurposeThe authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.Design/methodology/approachThe typewritten transcripts of the Croatian Writers' Society from the mid-60s of the 20th century are used as the test data. The optimal digitisation setup is investigated in order to obtain the best OCR results. This was done by using the sample of 123 pages digitised at different resolution settings and binarisation levels… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 9 publications
(11 reference statements)
0
3
0
Order By: Relevance
“…Omitting layout contexts from the processing of paper-based collections results in flat, unstructured text. Even if the OCR output text is high quality, it still requires substantial manual processing to delineate separate fields for record linkage and analysis, which are necessary precursors for research use (Stančić and Trbušić, 2020). Manual processes are often the main bottlenecks in digitization workflows (Blanke et al ., 2012).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Omitting layout contexts from the processing of paper-based collections results in flat, unstructured text. Even if the OCR output text is high quality, it still requires substantial manual processing to delineate separate fields for record linkage and analysis, which are necessary precursors for research use (Stančić and Trbušić, 2020). Manual processes are often the main bottlenecks in digitization workflows (Blanke et al ., 2012).…”
Section: Discussionmentioning
confidence: 99%
“…Much of the historical documentation OCR literature focuses on the digitization of prose documents or the conversion of hard copy tabular records into digital tabular data (Nagy, 1992; Stančić and Trbušić, 2020). However, bridging these use cases is the digitization of semi-structured historical documents, which hold data that could be converted into a tabular format but are not currently formatted into forms or tables.…”
Section: Introductionmentioning
confidence: 99%
“…Digitization can be defined as the action of moving physical elements and analogous methods to the digital plane, which involves the consideration of the use of data warehouses or the scanning of paper files to record all relevant documents, discarding outdated filing cabinets. It may be implemented as a system at the service of the informative, investigative and managerial legacy of any organization since it facilitates the supervision of a volume of information, as well as its editing and management [1][2][3].…”
Section: Introductionmentioning
confidence: 99%