2020 IEEE Winter Applications of Computer Vision Workshops (WACVW) 2020
DOI: 10.1109/wacvw50321.2020.9096928
|View full text |Cite
|
Sign up to set email alerts
|

Memory-Efficient Models for Scene Text Recognition via Neural Architecture Search

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…• As in NAS algorithms [15], [26], [52], [53], O c contains inverted bottleneck convolution (MBConv) layers [61] with kernel size k ∈ {3, 5} and expansion factor e ∈ {1, 6}. • As in [3], [8], [16], we use an output feature map of size 1×W/4.…”
Section: Search Space For the Spatial Modelmentioning
confidence: 99%
See 3 more Smart Citations
“…• As in NAS algorithms [15], [26], [52], [53], O c contains inverted bottleneck convolution (MBConv) layers [61] with kernel size k ∈ {3, 5} and expansion factor e ∈ {1, 6}. • As in [3], [8], [16], we use an output feature map of size 1×W/4.…”
Section: Search Space For the Spatial Modelmentioning
confidence: 99%
“…In this experiment, we compare TREFE with the following state-of-the-arts: (i) Bluche et al [80], which uses a deep architecture with multidimensional LSTM to extract features for text recognition; (ii) Sueiras et al [81], which extracts image patches and then decodes characters via a sequence-tosequence architecture with the addition of a convolutional network; (iii) Chowdhury et al [82], which proposes an attention-based sequence-to-sequence network; (iv) Bhunia et al [83], which uses an adversarial feature deformation module that learns to elastically warp the extracted features; (v) Zhang et al [84], which uses a sequence-to-sequence domain adaptation network to handle various handwriting styles; (vi) Fogel et al [85], which generates handwritten text images using a generative adversarial network (GAN); (vii) Wang et al [38], which alleviates the alignment problem in the attention mechanism of sequence-to-sequence text recognition models; (viii) Coquenet et al [13], which replaces the sequential model with lightweight, parallel convolutional networks; and (ix) Yousef et al [33], which does not use a sequential model but instead applies convolutional blocks with a gating mechanism; (x) Shi et al [8], which uses VGG as the spatial model and BiLSTM as the sequential model, and (xi) AutoSTR [37]. We do not compare with STR-NAS [15] (which is concurrent with an earlier conference version [16] of TREFE) as its reported performance is significantly worse. Tables 4 and 5 show results on the IAM and RIMES datasets, respectively.…”
Section: Handwritten Text Recognitionmentioning
confidence: 99%
See 2 more Smart Citations
“…Despite these developments, these NAS-based models are mainly limited to a few tasks such as image classification and general object detection, leading to the weak generalization ability in other tasks. To compensate for these drawbacks, many researchers explore applying NAS methods to their specific domains, including semantic segmentation (Liu et al 2019a), pose estimation (Xu et al 2021), and scene text recognition (Zhang et al 2020a;Hong, Kim, and Choi 2020). However, there is still rare to extend NAS approaches to text detection.…”
Section: Related Workmentioning
confidence: 99%