Yeon-gyu Kim scite author profile

Various methods for scene text recognition (STR) are proposed every year. These methods dramatically increase the performance of the existing STR field; however, they have not been able to keep up with the progress of general-purpose research in image recognition, detection, speech recognition, and text analysis. In this paper, we evaluate the performance of several deep learning schemes for the encoder part of the Transformer in STR. First, we change the baseline feed forward network (FFN) module of encoder to squeeze-andexcitation (SE)-FFN or cross stage partial (CSP)-FFN. Second, the overall architecture of encoder is replaced with local dense synthesizer attention (LDSA) or Conformer structure. Conformer encoder achieves the best test accuracy in various experiments, and SE or CSP-FFN also showed competitive performance when the number of parameters is considered. Visualizing the attention maps from different encoder combinations allows for qualitative performance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yeon-gyu Kim

Laser-directed synthesis of strain-induced crumpled MoS2 structure for enhanced triboelectrification toward haptic sensors

Kinetic motion sensors based on flexible and lead-free hybrid piezoelectric composite energy harvesters with nanowires-embedded electrodes for detecting articular movements

Piezoelectric energy conversion by lead-free perovskite BaTiO3 nanotube arrays fabricated using electrochemical anodization

Enhanced poling efficiency via a maximized organic-inorganic interfacial effect for water droplet-driven energy harvesting

Analysis of the Novel Transformer Module Combination for Scene Text Recognition

Contact Info

Product

Resources

About