2022
DOI: 10.48550/arxiv.2201.10252
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DocEnTr: An End-to-End Document Image Enhancement Transformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…To obtain a CF, we first convert the ruler into a binary image where unit markers are white and everything else is black. We binarize each ruler in three different ways: threshold sweep, segmentation (DocEnTr; Souibgui et al, 2022), and skeletonization. Finally, we use another machine learning network, an image classifier, to determine whether the binarization was successful.…”
Section: Plant Component Detectormentioning
confidence: 99%
“…To obtain a CF, we first convert the ruler into a binary image where unit markers are white and everything else is black. We binarize each ruler in three different ways: threshold sweep, segmentation (DocEnTr; Souibgui et al, 2022), and skeletonization. Finally, we use another machine learning network, an image classifier, to determine whether the binarization was successful.…”
Section: Plant Component Detectormentioning
confidence: 99%
“…Transformers have also been utilized for tasks like image restoration [24] and image de-warping [25]. [26] proposed a fully transformer-based approach for document image enhancement, without the need for any CNN. However since their approach is entirely based on conventional ViT without any design change, it fails to capture the local information from the patches.…”
Section: Transformers For Document Image Binarizationmentioning
confidence: 99%
“…The disadvantage is that the training is slow due to the need to generate images of different channels. In the same year, Souibgui et al proposed an encoder-decoder architecture based on Vision Transformer [41], as shown in Figure 7. The degraded image is first divided into several patches, which are then fed into the encoder, where the patch is mapped to a potential representation of each token during the encoding process, where each token corresponds one-to-one.…”
Section: Handwriting Fading Problemmentioning
confidence: 99%