2007
DOI: 10.1145/1292591.1292596
|View full text |Cite
|
Sign up to set email alerts
|

Error correction vs. query garbling for Arabic OCR document retrieval

Abstract: Due to the existence of large numbers of legacy documents (such as old books and newspapers), improving retrieval effectiveness for OCR'ed documents continues to be an important problem. This article compares the effect of OCR error correction with and without language modeling and the effect of query garbling with weighted structured queries on the retrieval of OCR degraded Arabic documents. The results suggest that moderate error correction does not yield statistically significant improvement in retrieval ef… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 18 publications
(27 reference statements)
0
6
0
Order By: Relevance
“…Our formulation is similar to approaches taken in OCR document retrieval, using degradations of character sequences (Darwish and Magdy, 2007;Darwish, 2003). For vocabulary-independent spoken term detection, perhaps the most closely related formulation is provided by (Mamou and Ramabhadran, 2008).…”
Section: Incorporating Query Degradationsmentioning
confidence: 99%
“…Our formulation is similar to approaches taken in OCR document retrieval, using degradations of character sequences (Darwish and Magdy, 2007;Darwish, 2003). For vocabulary-independent spoken term detection, perhaps the most closely related formulation is provided by (Mamou and Ramabhadran, 2008).…”
Section: Incorporating Query Degradationsmentioning
confidence: 99%
“…The main concept in [2] was creating a character level alignment from random words and then using a garbler to select a single edit operation, and accordingly a new character is inserted, deleted or substituted. In [2,3] the language models were used to obtain a better ranking of candidate words that corrects the OCR output. Our suggested improvements are based on: (a) adding more edit operations, (b) modeling correction rules and (c) improving the language models.…”
Section: Related Workmentioning
confidence: 99%
“…The criterion considered in [2,3] for alignment was the position of the erroneous characters in the word. From our point of view, this method needs to be improved so that an edit operation depends on other factors (e.g.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The work of Darwish and Magdy (2007), for example, although distantly-related to ours, differs significantly since it is focused on monolingual retrieval of scanned documents containing OCR errors, instead of multilingual retrieval with misspelling errors present in the queries, as is our case.…”
Section: Introductionmentioning
confidence: 99%