Zobeir Raisi scite author profile

Recent state-of-the-art scene text recognition methods are primarily based on Recurrent Neural Networks (RNNs), however, these methods require one-dimensional (1D) features and are not designed for recognizing irregular-text instances due to the loss of spatial information present in the original two-dimensional (2D) images. In this paper, we leverage a Transformer-based architecture for recognizing both regular and irregular text-in-the-wild images. The proposed method takes advantage of using a 2D positional encoder with the Transformer architecture to better preserve the spatial information of 2D image features than previous methods. The experiments on popular benchmarks, including the challenging COCO-Text dataset, demonstrate that the proposed scene text recognition method outperformed the state-of-the-art in most cases, especially on irregular-text recognition.

show abstract

Content-Based Image Retrieval for Tourism Application

Raisi

Mohanna

Rezaei

2011

View full text Add to dashboard Cite

2LSPE: 2D Learnable Sinusoidal Positional Encoding using Transformer for Scene Text Recognition

Raisi

Naiel

Younes

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zobeir Raisi

Transformer-based Text Detection in the Wild

Arbitrary Shape Text Detection using Transformers

2D Positional Embedding-based Transformer for Scene Text Recognition

Content-Based Image Retrieval for Tourism Application

2LSPE: 2D Learnable Sinusoidal Positional Encoding using Transformer for Scene Text Recognition

Contact Info

Product

Resources

About