2014 11th IAPR International Workshop on Document Analysis Systems 2014
DOI: 10.1109/das.2014.23
|View full text |Cite
|
Sign up to set email alerts
|

Ground-Truth Production in the Transcriptorium Project

Abstract: Abstract-TRANSCRIPTORIUM is a 3-years project that aims to develop innovative, cost-effective solutions for the indexing, search and full transcription of historical handwritten document images, using Handwritten Text Recognition (HTR) technology. The production of ground-truth (GT) of a dataset of handwritten document images is among the first tasks. We address novel approaches for the faster production of this GT based on crowdsourcing and on prior-knowledge methods. We also address here a novel low-cost sem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 36 publications
(30 citation statements)
references
References 8 publications
(13 reference statements)
0
23
0
Order By: Relevance
“…These pages entailed several line detection and transcription difficulties and the corresponding ground truth (GT) was produced semi-automatically and manually reviewed [12] (see examples of extracted lines in Fig. 2).…”
Section: Dataset Descriptionmentioning
confidence: 99%
“…These pages entailed several line detection and transcription difficulties and the corresponding ground truth (GT) was produced semi-automatically and manually reviewed [12] (see examples of extracted lines in Fig. 2).…”
Section: Dataset Descriptionmentioning
confidence: 99%
“…It is also difficult from the HTR point of view because it is written by several hands, it has crossed out words, hyphenated words, etc. Preliminary HTR results on a small set of 53 pages the Bentham collection were reported in [8] by using the HTR techniques mentioned in Section 2. The Word Error Rate (WER) obtained in that set was about 34% (see [8] for additional details).…”
Section: Current Collections Selected In Transcriptoriummentioning
confidence: 99%
“…Although these technologies are already providing useful results in some cases, much remains to be developed, especially for historical documents, which suffer from typical degradations [10,13,8].…”
Section: Introductionmentioning
confidence: 99%
“…The Bentham Dataset [10] consists of a series of documents from the Bentham collection. It has been prepared in the tranScriptorium project 1 .…”
Section: A Datasetmentioning
confidence: 99%