2000
DOI: 10.1023/a:1026564708926
|View full text |Cite
|
Sign up to set email alerts
|

Untitled

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2002
2002
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(6 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…Mittendorf et al investigated how robust IR systems are toward OCR errors in digitized documents [9]. They found that longer documents describing a single topic redundantly have a better chance of retrieval than documents that are either short or discuss different topics.…”
Section: Related Work 21 Ocr Quality and Retrievalmentioning
confidence: 99%
“…Mittendorf et al investigated how robust IR systems are toward OCR errors in digitized documents [9]. They found that longer documents describing a single topic redundantly have a better chance of retrieval than documents that are either short or discuss different topics.…”
Section: Related Work 21 Ocr Quality and Retrievalmentioning
confidence: 99%
“…Tanner et al (2009) suggest that word accuracy rates less than 80% are harmful for search, but when the word accuracy is over 80%, fuzzy search capabilities of search engines should manage the problems caused by word errors. Mittendorf and Schäuble's (2000) probabilistic model for data corruption seems to support this. Information retrieval is robust even with corrupted data, but IR works best with longer documents and long queries.…”
Section: Discussionmentioning
confidence: 84%
“…Besides retrieval performance effects poor OCR quality has an effect on ranking of the documents (Taghva et al 1996;Mittendorf and Schäuble 2000). In practice these kinds of drops in retrieval and ranking performance mean that the user will lose relevant documents: either they are not found at all by the search engine or the documents are so low in the ranking list that the user may skip them.…”
Section: Discussionmentioning
confidence: 99%
“…We could of course try to correct these errors in post-processimg with a dictionary, however existing work [14] indicates that this can introduce more problems for information retrieval due to false substitutions than it solves. We thus do not explore dictionarybased substitution methods.…”
Section: Baseline and Standard Prf Resultsmentioning
confidence: 99%