2006
DOI: 10.1016/j.ipm.2005.06.006
|View full text |Cite
|
Sign up to set email alerts
|

Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents

Abstract: Important legacy paper documents are digitized and collected in online accessible archives. This enables the preservation, sharing, and significantly the searching of these documents. The text contents of these document images can be transcribed automatically using OCR systems and then stored in an information retrieval system. However, OCR systems make errors in character recognition which have previously been shown to impact on document retrieval behaviour. In particular relevance feedback query-expansion me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
8
0

Year Published

2006
2006
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 16 publications
(11 citation statements)
references
References 13 publications
1
8
0
Order By: Relevance
“…Practical results suggest that while baseline IR can remain relatively unaffected by misspellings, relevance feedback via query expansion becomes highly unstable under these conditions (Lam-Adesina and Jones, 2006). This constitutes a major drawback in the design of IR systems, since query expansion is a major issue in the production of improved query formulations (Guo and Ramakrishnan, 2009;Lu et al, 2009a,b;Stokes et al, 2009).…”
Section: Introductionmentioning
confidence: 99%
“…Practical results suggest that while baseline IR can remain relatively unaffected by misspellings, relevance feedback via query expansion becomes highly unstable under these conditions (Lam-Adesina and Jones, 2006). This constitutes a major drawback in the design of IR systems, since query expansion is a major issue in the production of improved query formulations (Guo and Ramakrishnan, 2009;Lu et al, 2009a,b;Stokes et al, 2009).…”
Section: Introductionmentioning
confidence: 99%
“…In previous work [1] we demonstrated that the reduction in PRF performance for DIR is due to selection of some expansion terms with very low n(i) values which are misrecognized versions of more common terms corrupted at the character level. We could of course try to correct these errors in post-processimg with a dictionary, however existing work [14] indicates that this can introduce more problems for information retrieval due to false substitutions than it solves.…”
Section: Baseline and Standard Prf Resultsmentioning
confidence: 97%
“…In previous work [1], we demonstrated that a simple filtering of terms with low n(i) values partially addresses the problems with PRF for DIR associated with spelling mistakes illustrated in Table 1. However, the optimal value of n(i) for filtering may be sensitive to the statistics of individual collections.…”
Section: Improving Prf By String-based Compensation For Transcriptionmentioning
confidence: 87%
See 2 more Smart Citations