2019 15th International Conference on eScience (eScience) 2019
DOI: 10.1109/escience.2019.00060
|View full text |Cite
|
Sign up to set email alerts
|

Transkribus. A Platform for Automated Text Recognition and Searching of Historical Documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(8 citation statements)
references
References 2 publications
0
6
0
1
Order By: Relevance
“…60,61 In this way, OCR software can help to improve quality regarding the use of machine learningbased neural networks, as well as the adoption of postcorrection tools. 62,63 In some cases, there is no option to retrieve the datasets by means of an API, hindering the reuse of the digital collections locked inside siloed repositories. In addition, institutions publish the information as PDF files instead of plain text files amenable to computational use.…”
Section: Discussionmentioning
confidence: 99%
“…60,61 In this way, OCR software can help to improve quality regarding the use of machine learningbased neural networks, as well as the adoption of postcorrection tools. 62,63 In some cases, there is no option to retrieve the datasets by means of an API, hindering the reuse of the digital collections locked inside siloed repositories. In addition, institutions publish the information as PDF files instead of plain text files amenable to computational use.…”
Section: Discussionmentioning
confidence: 99%
“…Different tools are available for carrying out manuscript transcription, as for example Aletheia [ 34 ], a ground truthing tool, and Transkribus [ 35 ], a platform for the digitization, transcription, recognition and searching of historical documents. Usually, most of the tools adopt an architecture as the one shown in Figure 1 : a collection of documents, the data set DS , is manually transcribed and the annotated word images are included in the training set.…”
Section: Methodsmentioning
confidence: 99%
“…There are a few existing commercial products with functions similar to that of the proposed system. Some of them adapt a pre-trained system for Ottoman documents ( [23]) while others does not provide a transcription but only Optical Character Recognition (OCR) service ([1, 2] ). Furthermore it is impractical to evaluate their performance because of the usage restrictions applied in the free versions.…”
Section: Introductionmentioning
confidence: 99%