Reading in the mist: high-quality optical character recognition based on freely available early modern digitized books

Sangiacomo, Andrea; Hogenbirk, Hugo Dirk; Tanasescu, Raluca; Karaisl, Antonia; White, Nicholas J.

doi:10.1093/llc/fqac014

Search citation statements

Order By: Relevance

Paper Sections

Select...

53: Methods Of Digitization1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Other1

Article1

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, van Strien et al (2020) and Hill and Hengchen (2019) show that for 85-90% correctly transcribed texts, good results can be arrived at more or less irrespective of the method applied. Our own research of the impact of the OCR inaccuracies on collocate extraction shows that, compared with a fully accurate transcription, an 80% and more highly accurate transcription provides close to exactly the same results (Sangiacomo et al 2022a). Indeed, for collocate extraction, it seems that a truly random distribution of errors would lead to significant problems only from 70% downwards.…”

Section: 53: Methods Of Digitizationmentioning

confidence: 77%

See 1 more Smart Citation

Each book its own Babel

Hogenbirk

View full text Add to dashboard Cite

show abstract

Section: 53: Methods Of Digitizationmentioning

confidence: 77%

“…Concerning the differences between the final corpus and what was found in the dictionaries, seeSangiacomo et al 2021. 11 For further information on the digitization of the corpus, seeSangiacomo et al 2022a. …”

mentioning

confidence: 99%