2023
DOI: 10.1007/s00799-023-00345-6
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating and mitigating the impact of OCR errors on information retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 45 publications
0
2
0
Order By: Relevance
“…Our work, despite the complexity of the topic, is presented in a language that is scientifically correct and does not lose the reader's interest. It is therefore suitable for all the following: (1) advanced researchers in OCR and deep learning who want to know the specific direction of the field, the challenges, and the problems that need to be researched further; (2) deep-learning and OCR practitioners who wish to know exactly which models to use and those that other practitioners have developed, in addition to possible applications for which solutions may be implemented; and (3) interested readers who are curious about the uses of deep learning in Arabic OCR and want to learn about the topic in a friendly way.…”
Section: Significance: Why and How Is This Work Different?mentioning
confidence: 99%
See 1 more Smart Citation
“…Our work, despite the complexity of the topic, is presented in a language that is scientifically correct and does not lose the reader's interest. It is therefore suitable for all the following: (1) advanced researchers in OCR and deep learning who want to know the specific direction of the field, the challenges, and the problems that need to be researched further; (2) deep-learning and OCR practitioners who wish to know exactly which models to use and those that other practitioners have developed, in addition to possible applications for which solutions may be implemented; and (3) interested readers who are curious about the uses of deep learning in Arabic OCR and want to learn about the topic in a friendly way.…”
Section: Significance: Why and How Is This Work Different?mentioning
confidence: 99%
“…Optical character recognition (OCR) is a technique that is used to read and recognize the text present in an image and then convert it into a textual format. Once the text is extracted and digitized, it can utilize the applications of storing, retrieving, searching, and editing [1][2][3][4][5]. Interchangeably, OCR is referred to as "text recognition".…”
Section: Introductionmentioning
confidence: 99%
“…A more recent work showed that information retrieval can be damaged with only 5% of error rate at the character-level [25]. Interestingly, de Oliveira et al [26] showed that, with comparable error rates in the documents, longer ones are more impacted by OCR errors than shorter ones.…”
Section: Related Workmentioning
confidence: 99%
“…Apart from the longer shelf-life of mathematics information, researchers preferred print because the medium was better suited to communicating equations and formulas. With the advancements of OCR technologies (i.e., accurate recognition and encoding of mathematical formulas and other special characters), mathematics information is more easily disseminated digitally today [6]. As a result, the Math Library observed a slower transition from print materials to a digital-preferred strategy than most STEM subject libraries, a phenomenon that will be speculated upon further in the Discussion section of this paper.…”
Section: Introductionmentioning
confidence: 89%