1997
DOI: 10.1016/s0031-3203(96)00155-0
|View full text |Cite
|
Sign up to set email alerts
|

Document retrieval tolerating character recognition errors—evaluation and application

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2000
2000
2012
2012

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(10 citation statements)
references
References 2 publications
0
10
0
Order By: Relevance
“…This improved average precision retrieval effectiveness in all but one case. However, a further study reported in (Marukawa et al, 1997) again showed the ineffectiveness of query expansion for retrieval from corrupted text. In this research 1083 Japanese news articles were searched using 50 test queries.…”
Section: University Of Nevada Las Vegasmentioning
confidence: 97%
“…This improved average precision retrieval effectiveness in all but one case. However, a further study reported in (Marukawa et al, 1997) again showed the ineffectiveness of query expansion for retrieval from corrupted text. In this research 1083 Japanese news articles were searched using 50 test queries.…”
Section: University Of Nevada Las Vegasmentioning
confidence: 97%
“…Although there has been some work in trying to compensate for optical character recognition (OCR) errors introduced into automatically scanned text documents (Marukawa et al 1997;Zhai et al 1996), the area of robust methods for dealing with speech recognition errors in the context of spoken document retrieval is still relatively new. There has been some recent work in this area performed independently and in parallel to the work presented in this thesis.…”
Section: Motivationmentioning
confidence: 99%
“…For text documents, there has been work in trying to compensate for optical character recognition (OCR) errors introduced into automatically scanned text documents (Marukawa et al 1997;Zhai et al 1996). In (Marukawa et al 1997), two methods are proposed to deal with character recognition errors for Japanese text documents. One method uses a character error confusion matrix to generate "equivalent" query strings to try to match erroneously recognized text.…”
Section: Related Workmentioning
confidence: 99%
“…Speci®cally there have been a number of systems for problems similar to the one we discuss here although using different approaches [35,36]. One that deals with mathematical expressions in a scienti®c document has recently been described in an overall document processing system [35].…”
Section: Semi-structured Documents and Error Correctionmentioning
confidence: 99%
“…One characterization uses a confusion matrix (as in speech recognition) to generate``equivalent'' query strings that should match erroneously recognized text. The other one searches``non-deterministic text'' that contains multiple candidates for ambiguous recognition results [36]. Another approach uses an approximate tree matching method to identify similarities between the documents' structured parts and samples to perform information extraction.…”
Section: Semi-structured Documents and Error Correctionmentioning
confidence: 99%