Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage 2014
DOI: 10.1145/2595188.2595197
|View full text |Cite
|
Sign up to set email alerts
|

PoCoTo - an open source system for efficient interactive postcorrection of OCRed historical texts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 24 publications
(21 citation statements)
references
References 11 publications
0
20
0
Order By: Relevance
“…In the context of historical OCR the interactive postcorrection tool PoCoTo 27 represents the state-of-the-art. The original PoCoTo introduced by Vobl et al [39] is a system developed to support the efficient interactive postcorrection of historical texts by offering several advanced features: Suspicious tokens of the OCR text are identified by a special language technology which is aware of historical language variations represented by rewrite rules like t → th (modern spelling vs. historical spelling) and can be corrected by choosing a word from a list of generated plausible correction candidates. The user does not have to perform this for every single word but can batch correct entire error series which for example can consist of identically misrecognized words or words that suffer from the same OCR error, for example the confusion of "e" and "c".…”
Section: Pocotomentioning
confidence: 99%
“…In the context of historical OCR the interactive postcorrection tool PoCoTo 27 represents the state-of-the-art. The original PoCoTo introduced by Vobl et al [39] is a system developed to support the efficient interactive postcorrection of historical texts by offering several advanced features: Suspicious tokens of the OCR text are identified by a special language technology which is aware of historical language variations represented by rewrite rules like t → th (modern spelling vs. historical spelling) and can be corrected by choosing a word from a list of generated plausible correction candidates. The user does not have to perform this for every single word but can batch correct entire error series which for example can consist of identically misrecognized words or words that suffer from the same OCR error, for example the confusion of "e" and "c".…”
Section: Pocotomentioning
confidence: 99%
“…Visual support of the post-correction process has been emphasized by e.g. Vobl et al (2014) who describe a system of iterative post-correction of OCRed historical text which is evaluated in an application-oriented way. They present the human corrector with an alignment of image and OCRed text and make batch correction of the same error in the entire document possible.…”
Section: Related Workmentioning
confidence: 99%
“…The basic list, which goes back to the IMPACT project, contains the most frequent patterns such as s:ſ, u:v, consonant doublings such as n:nn etc. The extended list was built by looking at previous profiler output in the context of our postcorrection tool PoCoTo⁵ [6], when apparent prominent OCR error patterns turned out to actually represent an ²http://reader.digitale-sammlungen.de/de/fs1/ object/display/bsb11106588_00064.html ³http://reader.digitale-sammlungen.de/de/fs1/ object/display/bsb10727266_00071.html ⁴Lüdeling, Anke; Odebrecht, Carolin; Zeldes, Amir; RIDGES-Herbology (Version ⒌0), Humboldt-Universität zu Berlin, https://www.linguistik.hu-berlin.de/en/instituten/professuren-en/korpuslinguistik/research/ ridges-projekt?set_language=en ⁵https://github.com/cisocrgroup/PoCoTo additional historical pattern. In this way we found historical spelling patterns such as ß: (see Fig.…”
Section: Evaluation Data and Principlesmentioning
confidence: 99%