2017
DOI: 10.1093/llc/fqw064
|View full text |Cite
|
Sign up to set email alerts
|

Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription

Abstract: We present a process for cost-effective transcription of cursive handwritten text images that has been tested on a 1 000 pages 17th century book about botanical species. The process comprised two main tasks, namely: (1) preprocessing: page layout analysis, text line detection, and extraction; and (2) transcription of the extracted text line images. Both tasks were carried out with semiautomatic procedures, aimed at incrementally minimizing user correction effort, by means of computer-assisted line detection an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…The HTR system proposed a full transcript of a given text line image and every time the user amended a wrong word the following ones were updated by the system. In [ 46 ] the user was an expert palaeographer while in [ 47 ] students in History were involved. In both the papers, the authors noticed a significant typing effort reduction that did not result in a net user time effort savings.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The HTR system proposed a full transcript of a given text line image and every time the user amended a wrong word the following ones were updated by the system. In [ 46 ] the user was an expert palaeographer while in [ 47 ] students in History were involved. In both the papers, the authors noticed a significant typing effort reduction that did not result in a net user time effort savings.…”
Section: Resultsmentioning
confidence: 99%
“…Therefore, Table 6 provides a rough indication of the transcription time spent by the palaeographers that adopted one of the listed systems. 4 52.54% The experimental studies presented in [46,47] were performed with the same HTR, which is based on Hidden Markov modes and N-grams models, on two different document collections. The HTR system proposed a full transcript of a given text line image and every time the user amended a wrong word the following ones were updated by the system.…”
Section: Comparison With the State Of The Artmentioning
confidence: 99%
“…An alternative to fully automatic processing is to rely on computer-assisted transcription. This was successfully explored empirically by Toselli et al (2017), Romero et al(2012) and Alabau et al (2014), following new, powerful concepts of pattern recognition-based human-machine interaction introduced by Vidal et al (2007) and Toselli et al (2011). Following the positive results of these laboratory studies, preliminary evaluation by real users was carried out by Toselli et al (2016).…”
Section: Introductionmentioning
confidence: 99%
“…We use two large data sets, from the Bentham and Plantas collections. The Bentham partition that we use is the one used in the ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets [Sánchez et al, 2014], while the Plantas partition was introduced in [Toselli et al, 2018a]. More details about these databases can be found in appendices A.1 and A.5, respectively.…”
Section: Measure Iam Gw Parmentioning
confidence: 99%
“…We excluded empty page images or those containing only drawings. Full details about this database are found in [Toselli et al, 2018a]. It is important to mention that reference transcripts of Plantas are provided in two different versions: diplomatic and modernized.…”
Section: A5 Plantasmentioning
confidence: 99%