Memory-Efficient Models for Scene Text Recognition via Neural Architecture Search

Hong, SeulGi; Kim, Yejin; Choi, Min-Kook

doi:10.1109/wacvw50321.2020.9096928

Cited by 4 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• As in NAS algorithms [15], [26], [52], [53], O c contains inverted bottleneck convolution (MBConv) layers [61] with kernel size k ∈ {3, 5} and expansion factor e ∈ {1, 6}. • As in [3], [8], [16], we use an output feature map of size 1×W/4.…”

Section: Search Space For the Spatial Modelmentioning

confidence: 99%

“…In this experiment, we compare TREFE with the following state-of-the-arts: (i) Bluche et al [80], which uses a deep architecture with multidimensional LSTM to extract features for text recognition; (ii) Sueiras et al [81], which extracts image patches and then decodes characters via a sequence-tosequence architecture with the addition of a convolutional network; (iii) Chowdhury et al [82], which proposes an attention-based sequence-to-sequence network; (iv) Bhunia et al [83], which uses an adversarial feature deformation module that learns to elastically warp the extracted features; (v) Zhang et al [84], which uses a sequence-to-sequence domain adaptation network to handle various handwriting styles; (vi) Fogel et al [85], which generates handwritten text images using a generative adversarial network (GAN); (vii) Wang et al [38], which alleviates the alignment problem in the attention mechanism of sequence-to-sequence text recognition models; (viii) Coquenet et al [13], which replaces the sequential model with lightweight, parallel convolutional networks; and (ix) Yousef et al [33], which does not use a sequential model but instead applies convolutional blocks with a gating mechanism; (x) Shi et al [8], which uses VGG as the spatial model and BiLSTM as the sequential model, and (xi) AutoSTR [37]. We do not compare with STR-NAS [15] (which is concurrent with an earlier conference version [16] of TREFE) as its reported performance is significantly worse. Tables 4 and 5 show results on the IAM and RIMES datasets, respectively.…”

Section: Handwritten Text Recognitionmentioning

confidence: 99%

“…Recently, it has been shown that neural architecture search (NAS) [21] can produce good network architectures in tasks such as computer vision (e.g., image classification [22], [23], semantic segmentation [24] and object detection [25]). Inspired by this, rather than relying on experts to design architectures, we propose the use of one-shot NAS [22], [23], [26] to search for a high-performance TR feature [8] fixed vgg [11] BiLSTM -× designed ASTER [3] fixed residual [12] BiLSTM -× GFCN [13] fixed gated-block [13] --× SCRN [14] fixed residual [12] BiLSTM -× STR-NAS [15] fixed searched BiLSTM grad. × NAS AutoSTR [16] two-dim searched BiLSTM grid+grad.…”

Section: Introductionmentioning

confidence: 99%

“…Concurrently, Hong et al [15] also considered the use of NAS in scene text recognition. However, they only search for the convolution operator, while we search for both the spatial feature and sequential feature extractors with deployment constraints (see Table 1).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Searching a High Performance Feature Extractor for Text Recognition Network

Zhang¹,

Yao

Kwok

et al. 2022

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Feature extractor plays a critical role in text recognition (TR), but customizing its architecture is relatively less explored due to expensive manual tweaking. In this work, inspired by the success of neural architecture search (NAS), we propose to search for suitable feature extractors. We design a domain-specific search space by exploring principles for having good feature extractors. The space includes a 3D-structured space for the spatial model and a transformed-based space for the sequential model. As the space is huge and complexly structured, no existing NAS algorithms can be applied. We propose a two-stage algorithm to effectively search in the space. In the first stage, we cut the space into several blocks and progressively train each block with the help of an auxiliary head. We introduce the latency constrain into the second stage and search sub-network from the trained supernet via natural gradient descent. In experiments, a series of ablation studies are performed to better understand the designed space, search algorithm, and searched architectures. We also compare the proposed method with various state-of-the-art ones on both hand-written and scene TR tasks. Extensive results show that our approach can achieve better recognition performance with less latency.

show abstract

Section: Search Space For the Spatial Modelmentioning

confidence: 99%

Section: Handwritten Text Recognitionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Searching a High Performance Feature Extractor for Text Recognition Network

Zhang¹,

Yao

Kwok

et al. 2022

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…Despite these developments, these NAS-based models are mainly limited to a few tasks such as image classification and general object detection, leading to the weak generalization ability in other tasks. To compensate for these drawbacks, many researchers explore applying NAS methods to their specific domains, including semantic segmentation (Liu et al 2019a), pose estimation (Xu et al 2021), and scene text recognition (Zhang et al 2020a;Hong, Kim, and Choi 2020). However, there is still rare to extend NAS approaches to text detection.…”

Section: Related Workmentioning

confidence: 99%

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

Chen¹,

Wang²,

Xie³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose an accurate and efficient scene text detection framework, termed FAST (i.e., faster arbitrarily-shaped text detector). Different from recent advanced text detectors that used hand-crafted network architectures and complicated post-processing, resulting in low inference speed, FAST has two new designs. (1) We search the network architecture by designing a network search space and reward function carefully tailored for text detection, leading to more powerful features than most networks that are searched for image classification. (2) We design a minimalist representation (only has 1-channel output) to model text with arbitrary shape, as well as a GPU-parallel post-processing to efficiently assemble text lines with negligible time overhead. Benefiting from these two designs, FAST achieves an excellent trade-off between accuracy and efficiency on several challenging datasets. For example, FAST-A0 yields 81.4% F-measure at 152 FPS on Total-Text, outperforming the previous fastest method by 1.5 points and 70 FPS in terms of accuracy and speed. With Ten-sorRT optimization, the inference speed can be further accelerated to over 600 FPS.

show abstract

License plate recognition using neural architecture search for edge devices

Shashirangana

Padmasiri

Meedeniya

et al. 2021

Int J of Intelligent Sys

View full text Add to dashboard Cite

The mutually beneficial blend of artificial intelligence with internet of things has been enabling many industries to develop smart information processing solutions. The implementation of technology enhanced industrial intelligence systems is challenging with the environmental conditions, resource constraints and safety concerns. With the era of smart homes and cities, domains like automated license plate recognition (ALPR) are exploring automate tasks such as traffic management and fraud detection. This paper proposes an optimized decision support solution for ALPR that works purely on edge devices at night‐time. Although ALPR is a frequently addressed research problem in the domain of intelligent systems, still they are generally computationally intensive and unable to run on edge devices with limited resources. Therefore, as a novel approach, we consider the complex aspects related to deploying lightweight yet efficient and fast ALPR models on embedded devices. The usability of the proposed models is assessed in real‐world with a proof‐of‐concept hardware design and achieved competitive results to the state‐of‐the‐art ALPR solutions that run on server‐grade hardware with intensive resources.

show abstract

Memory-Efficient Models for Scene Text Recognition via Neural Architecture Search

Cited by 4 publications

References 26 publications

Searching a High Performance Feature Extractor for Text Recognition Network

Searching a High Performance Feature Extractor for Text Recognition Network

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

License plate recognition using neural architecture search for edge devices

Contact Info

Product

Resources

About