Proceedings of the 1st ACM Workshop on Hardcopy Document Processing 2004
DOI: 10.1145/1031442.1031446
|View full text |Cite
|
Sign up to set email alerts
|

A filter based post-OCR accuracy boost system

Abstract: Our current research effort aims at building a filter based post-OCR accuracy boost system that will combine different post-OCR correction filters to improve the OCR accuracy better than each individual filter can. In this paper we focus on a Hidden Markov Model (HMM) based accuracy booster modeling OCR engine noise generation as a two-layer stochastic process. We employ a commercial spellchecker both as another error correction filter and as a base line for accuracy boost comparison. We demonstrate the versat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2007
2007
2022
2022

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(12 citation statements)
references
References 4 publications
0
12
0
Order By: Relevance
“…They propose an efficient sampling algorithm that accounts for a noisy data and get more representative sample. In [8], they propose a technique that is based on hidden Markov model that aims at minimizing the noise output of the OCR. They measure their solution against a base-line solution that is based solely on spelling checker to remove noise.…”
Section: Related Workmentioning
confidence: 99%
“…They propose an efficient sampling algorithm that accounts for a noisy data and get more representative sample. In [8], they propose a technique that is based on hidden Markov model that aims at minimizing the noise output of the OCR. They measure their solution against a base-line solution that is based solely on spelling checker to remove noise.…”
Section: Related Workmentioning
confidence: 99%
“…They propose an efficient sampling algorithm that accounts for a noisy data and get more representative sample. In [7], they propose a technique that is based on hidden markov model that aims at minimizing the noise output of the OCR. They measure their solution against a base-line solution that is based solely on spelling checker to remove noise.…”
Section: Related Workmentioning
confidence: 99%
“…On the contrary, for lightweight methods, systems use probabilistic techniques and n-gram analysis, classically solved through Hidden Markov Models (HMM) or dynamic programming, first used by Neuhoff [14] in text correction. Borovikov et al [3] have built a HMM-based correction using several post-OCR filters. OCR errors were modeled in terms of a two-layer stochastic process to deal with known and observed characters.…”
Section: State-of-the-art Of Natural Scene Ocr Correctionmentioning
confidence: 99%