Dewarping Document Image by Displacement Flow Estimation with Fully Convolutional Network

Xie, Guo-Wang; Yin, Fei; Zhang, Xu-Yao; Liu, Cheng‐Lin

doi:10.1007/978-3-030-57058-3_10

Cited by 26 publications

(43 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, we use image similarity and OCR accuracy to evaluate the performance of illumination correction. To be specific, for pixel alignment, we use Local Distortion (LD) [43] as recommended in [7,22,41] to evaluate the geometric distortion of rectified images. For image similarity, we use Multi-Scale Structural SIMilarity (MS-SSIM) [39] as previous works [7,22,41] suggest.…”

Section: Experiments 51 Evaluation Metricsmentioning

confidence: 99%

“…To be specific, for pixel alignment, we use Local Distortion (LD) [43] as recommended in [7,22,41] to evaluate the geometric distortion of rectified images. For image similarity, we use Multi-Scale Structural SIMilarity (MS-SSIM) [39] as previous works [7,22,41] suggest. For OCR, following [7,22], we choose Edit Distance (ED) [17] and Character Error Rate (CER) to evaluate the capacity on text recognition.…”

Section: Experiments 51 Evaluation Metricsmentioning

confidence: 99%

See 1 more Smart Citation

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

Feng¹,

Wang²,

Zhou³

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

Section: Experiments 51 Evaluation Metricsmentioning

confidence: 99%

Section: Experiments 51 Evaluation Metricsmentioning

confidence: 99%

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

Feng¹,

Wang²,

Zhou³

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…However, the estimation and subsequent stitching of the warping flow patches heavily increase the computational cost. More recently, based on Fully Convolutional Network [49], Xie et al [16] perform a foreground/background classification as a post-processing to refine the predicted forward warping flow on boundary regions of the document.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, deep learning has been introduced to document image rectification with promising performance as well as a significant reduction in computational cost. In deep learning based methods [13], [14], [15], [16], [17], [18], [19], document image rectification is approached by directly regressing a dense 2D vector field (warping flow) that samples the pixels from the distorted images to the rectified ones. However, these methods still suffer from two non-trivial issues.…”

Section: Introductionmentioning

confidence: 99%

“…First, a common problem in the current learning-based approaches [13], [14], [15], [16], [19] is that the boundary regions cannot be well rectified. In fact, one of the reasons is that many of these methods [13], [14], [16] directly take the whole distorted image as input to rectification networks (pipeline (I) in Figure 1), which involves extra implicit learning to identify the foreground document for predicting the rectified document. As a result, the rectified documents often struggle with incomplete or redundant boundaries, which cause further geometric distortion to the nearby contents.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DocScanner: Robust Document Image Rectification with Progressive Learning

Feng¹,

Zhou²,

Deng³

et al. 2021

Preprint

View full text Add to dashboard Cite

Compared to flatbed scanners, portable smartphones are much more convenient for physical documents digitizing. However, such digitized documents are often distorted due to uncontrolled physical deformations, camera positions, and illumination variations. To this end, this work presents DocScanner, a new deep network architecture for document image rectification. Different from existing methods, DocScanner addresses this issue by introducing a progressive learning mechanism. Specifically, DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture. The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency. In addition, before the above rectification process, observing the corrupted rectified boundaries existing in prior works, DocScanner exploits a document localization module to explicitly segment the foreground document from the cluttered background environments. To further improve the rectification quality, based on the geometric priori between the distorted and the rectified images, a geometric regularization is introduced during training to further facilitate the performance. Extensive experiments are conducted on the Doc3D dataset and the DocUNet benchmark dataset, and the quantitative and qualitative evaluation results verify the effectiveness of DocScanner, which outperforms previous methods on OCR accuracy, image similarity, and our proposed distortion metric by a considerable margin. Furthermore, our DocScanner shows the highest efficiency in inference time and parameter count.

show abstract