2015 13th International Conference on Document Analysis and Recognition (ICDAR) 2015
DOI: 10.1109/icdar.2015.7333720
|View full text |Cite
|
Sign up to set email alerts
|

Combination of multiple aligned recognition outputs using WFST and LSTM

Abstract: The contribution of this paper is a new strategy of integrating multiple recognition outputs of diverse recognizers. Such an integration can give higher performance and more accurate outputs than a single recognition system. The problem of aligning various Optical Character Recognition (OCR) results lies in the difficulties to find the correspondence on character, word, line, and page level. These difficulties arise from segmentation and recognition errors which are produced by the OCRs. Therefore, alignment t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 10 publications
0
8
0
Order By: Relevance
“…Instead of aligning OCR versions of the same scan, an approach of Wemhoener et al [163] enables to create a sequence alignment of OCR outputs with the scans of different copies of the same book, or its different editions. Al Azawi et al [4,8] apply Line-to-Page alignment that aligns each line of the 1st OCR with the whole page of the second OCR using Weighted Finite-State Transducers (WFST).…”
Section: Isolated-word Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Instead of aligning OCR versions of the same scan, an approach of Wemhoener et al [163] enables to create a sequence alignment of OCR outputs with the scans of different copies of the same book, or its different editions. Al Azawi et al [4,8] apply Line-to-Page alignment that aligns each line of the 1st OCR with the whole page of the second OCR using Weighted Finite-State Transducers (WFST).…”
Section: Isolated-word Approachesmentioning
confidence: 99%
“…In the last step, several techniques are applied to choose the best sequence. Lopresti et al [91], Lin [87], Wemhoener et al [163], and Reul et al [129] utilize voting policy, Al Azawi et al [4,8] use Long Short-Term Memory (LSTM) [64] to decide the most relevant output. Different kinds of features (voting, number, dictionary, gazetteer, and lexical feature) are used in learning decision list, maximum entropy classification or conditional random fields (CRF) methods to choose the best possible correction by Lund et al [92,[94][95][96].…”
Section: Isolated-word Approachesmentioning
confidence: 99%
“…Ensemble methods have been shown to be effective in OCR postcorrection by combining OCR output from multiple scans of the same document (Lopresti and Zhou, 1997;Klein and Kopel, 2002;Cecotti and Belaïd, 2005;Lund et al, 2013). Existing methods aim at generating consensus results by aligning multiple inputs, followed by supervised methods such as classification (Boschetti et al, 2009;Lund et al, 2011;Al Azawi et al, 2015), or unsupervised methods such as dictionary-based selection (Lund and Ringger, 2009) and voting (Wemhoener et al, 2013;Xu and Smith, 2017). While supervised ensemble methods require human annotation for training, unsupervised selection methods work only when the correct word or character exists in one of the inputs.…”
Section: Related Workmentioning
confidence: 99%
“…Most of these ensemble methods, however, require aligning multiple OCR outputs (Lund and Ringger, 2009;Lund et al, 2011), which is intractable in general and might introduce noise into the later correction stage. Furthermore, voting-based ensemble methods (Lund and Ringger, 2009;Wemhoener et al, 2013;Xu and Smith, 2017) only work where the correct output exists in one of the inputs, while classification methods (Boschetti et al, 2009;Lund et al, 2011;Al Azawi et al, 2015) are also trained on human annotations.…”
Section: Introductionmentioning
confidence: 99%
“…Azawi et al [13] used weighted finite-state transducers based on edit rules to align the output of two different OCR engines. Neural LSTM networks trained on the aligned outputs are used to return a best voting.…”
Section: Related Workmentioning
confidence: 99%