2021
DOI: 10.5334/johd.46
|View full text |Cite
|
Sign up to set email alerts
|

General Models for Handwritten Text Recognition: Feasibility and State-of-the Art. German Kurrent as an Example

Abstract: Existing text recognition engines enables to train general models to recognize not only one specific hand but a multitude of historical hands within a particular script, and from a rather large time period (more than 100 years). This paper compares different text recognition engines and their performance on a test set independent of the training and validation sets. We argue that both, test set and ground truth, should be made available by researchers as part of a shared task to allow for the comparison of eng… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 5 publications
0
1
0
Order By: Relevance
“…This is probably what motivated Ströbel et al (2022) in using the perplexity of lan-guage models to detect the erroneous output in an unsupervised manner. On the other hand, although we consider that recognition errors are overall more for handwritten text compared to printed material, the quality of recognition can vary significantly for the former (Hodel et al, 2021), as is also shown in Fig. 1, and does not always come with a high error rate.…”
Section: Related Workmentioning
confidence: 92%
“…This is probably what motivated Ströbel et al (2022) in using the perplexity of lan-guage models to detect the erroneous output in an unsupervised manner. On the other hand, although we consider that recognition errors are overall more for handwritten text compared to printed material, the quality of recognition can vary significantly for the former (Hodel et al, 2021), as is also shown in Fig. 1, and does not always come with a high error rate.…”
Section: Related Workmentioning
confidence: 92%
“…However, recent developments in technology combined with new infrastructures and software have made these methods more and more accessible. Methods such as CRNN aim to reduce training data requirements and modern models can now achieve character error rates (CER) below 2% for manuscripts, indicating the effectiveness of these technologies (Hodel et al [2021]).…”
Section: A Brief History Of Atrmentioning
confidence: 99%
“…However, given the substantial variation in writing styles and hands across the medieval period and the scarcity of domain-specific ground truth, a more comprehensive approach to handwriting classification is necessary. With sufficient training data the merging of distinct hands into a single family-script model is achievable (Hodel et al [2021]). In our case, we adopt the classification based on Latin script families, as proposed by the CLAMM corpus, which encompasses 12 book-script families spanning the period from the 9th to the 15th centuries (Kestemont et al [2017]).…”
Section: Related Workmentioning
confidence: 99%