Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

Chen, Jingye; Yu, Haiyang; Ma, Jianqi; Li, Bin; Xue, Xiangyang

doi:10.1609/aaai.v36i1.19904

Cited by 34 publications

(21 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…TPGSR [28] employs a text prior generator to extract categorical probability distribution as guidance for the text image reconstruction process. Text Gestalt [29] pre-trains a text recognizer to highlight the stroke-level details. All of the previous works concentrate on recovering SR text images in a fully supervised manner, that is, with all the LR-HR pairs being used.…”

Section: Scene Text Image Super Resolution (Stisr)mentioning

confidence: 99%

“…Inspired by the success of TSRN, many researchers have started to investigate real-world STISR to improve the quality of LR text images, thus improving recognition accuracy. However, all of the current works concentrate on recovering LR scene text images in a fully supervised manner, that is, with all the LR-HR pairs being used [25][26][27][28][29].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

TLWSR: Weakly supervised real‐world scene text image super‐resolution using text label

Shi

Zhu

Fang

et al. 2023

IET Image Processing

View full text Add to dashboard Cite

Scene text image super-resolution (STISR) has recently received considerable attention. Existing STISR methods are applicable to the situation that all the LR-HR pairs are available. However, in real-world scenarios, it is difficult and expensive to collect ground-truth HR labels and align them with LR images, and thus it is essential to find a way to implement weakly supervised learning. We investigate the STISR problem in the situation that only a subset of HR labels is available and design a weak supervision framework using coarsegrained text labels named TLWSR, which combines incomplete supervision and inexact supervision. Specifically, a lightweight text recognition network and connectionist temporal classification loss are used to guide the super-resolution of text images during training. Extensive experiments on the benchmark TextZoom demonstrate that TLWSR generates distinguishable text images and exceeds the fully supervised baseline TSRN in boosting text recognition accuracywith only 50% HR labels available. Meanwhile, TLWSR can be applied to different super-resolution backbones and significantly improves their performance. Furthermore, TLWSR shows good generalization capability to low-quality images on scene text recognition benchmarks, which verifies the effectiveness of this framework. To the authors' knowledge, this is the first work exploring the problem of STISR in weakly supervised scenarios.

show abstract

Section: Scene Text Image Super Resolution (Stisr)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

TLWSR: Weakly supervised real‐world scene text image super‐resolution using text label

Shi

Zhu

Fang

et al. 2023

IET Image Processing

View full text Add to dashboard Cite

show abstract

“…C3-STISR [14] is proposed to learn triple clues, including recognition clue from a STR, linguistical clue from a language model, and a visual clue from a skeleton painter to rich the representation of the text-specific clue. TG [9] and [11] exploit stroke-level information from HR images via stroke-focused module and skeleton loss for more fine-grained super-resolution. Compared with generic image super-resolution approaches, these methods greatly advance the recognition accuracy through various textspecific information extraction techniques.…”

Section: B Scene Text Image Super-resolutionmentioning

confidence: 99%

“…STT [8] exploits character-level attention maps from HR images to assist the recovery. [11] and TG [9] extract stroke-level information from HR images through specific networks to provide more fine-grained supervision information. [12], [13], [14] additionally introduce external modules to extract various textspecific clues to facilitate the recovery and use the supervision from HR images to finetune their modules.…”

mentioning

confidence: 99%

Positional Information is a Strong Supervision for Volumetric Medical Image Segmentation

Yu-hong

Hou

Zeng

et al. 2023

J. Shanghai Jiaotong Univ. (Sci.)

View full text Add to dashboard Cite

Scene text image super-resolution (STISR) is an important pre-processing technique for text recognition from lowresolution scene images. Nowadays, various methods have been proposed to extract text-specific information from high-resolution (HR) images to supervise STISR model training. However, due to uncontrollable factors (e.g. shooting equipment, focus, and environment) in manually photographing HR images, the quality of HR images cannot be guaranteed, which unavoidably impacts STISR performance. Observing the quality issue of HR images, in this paper we propose a novel idea to boost STISR by first enhancing the quality of HR images and then using the enhanced HR images as supervision to do STISR. Concretely, we develop a new STISR framework, called High-Resolution ENhancement (HiREN) that consists of two branches and a quality estimation module. The first branch is developed to recover the low-resolution (LR) images, and the other is an HR quality enhancement branch aiming at generating high-quality (HQ) text images based on the HR images to provide more accurate supervision to the LR images. As the degradation from HQ to HR may be diverse, and there is no pixel-level supervision for HQ image generation, we design a kernel-guided enhancement network to handle various degradation, and exploit the feedback from a recognizer and text-level annotations as weak supervision signal to train the HR enhancement branch. Then, a quality estimation module is employed to evaluate the qualities of HQ images, which are used to suppress the erroneous supervision information by weighting the loss of each image. Extensive experiments on TextZoom show that HiREN can work well with most existing STISR methods and significantly boost their performances.

show abstract

“…The same authors also focused on the internal stroke-level structures of characters in text images. Thus, they designed rules for decomposing English characters and digits at the stroke level and proposed using a pretrained text recognizer to provide stroke-level attention maps as positional cues [19]. Ma et al [9] provided guidance to recover HR text images by introducing an explicit text prior to the character probability sequence obtained from a text recognition model.…”

Section: Deep Learning-based Text Image Super-resolutionmentioning

confidence: 99%

Semantic Super-Resolution of Text Images via Self-Distillation

Park¹

2022

Electronics

View full text Add to dashboard Cite

This research develops an effective single-image super-resolution (SR) method that increases the resolution of scanned text or document images and improves their readability. To this end, we introduce a new semantic loss and propose a semantic SR method that guides an SR network to learn implicit text-specific semantic priors through self-distillation. Experiments on the enhanced deep SR (EDSR) model, one of the most popular SR networks, confirmed that semantic loss can contribute to further improving the quality of text SR images. Although the improvement varied depending on image resolution and dataset, the peak signal-to-noise ratio (PSNR) value was increased by up to 0.3 dB by introducing the semantic loss. The proposed method outperformed an existing semantic SR method.

show abstract

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

Cited by 34 publications

References 32 publications

TLWSR: Weakly supervised real‐world scene text image super‐resolution using text label

TLWSR: Weakly supervised real‐world scene text image super‐resolution using text label

Positional Information is a Strong Supervision for Volumetric Medical Image Segmentation

Semantic Super-Resolution of Text Images via Self-Distillation

Contact Info

Product

Resources

About