SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Tran, Bao Hieu; Le-Cong, Thanh; Nguyen, Huu Manh; Le, Duc Dung; Nguyen, Thanh Hung; Nguyen, Phi Le

doi:10.1109/icmla51294.2020.00223

Cited by 3 publications

(2 citation statements)

References 32 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This method identifies characters one-by-one resulting in low speed. Tran [4] proposed SAFL, a self-attention-based neural network model with focal loss for scene text recognition. SAFL utilized focal loss, which allows the model to focus more on training low-frequency samples.…”

Section: Related Workmentioning

confidence: 99%

“…In STR, Recurrent Neural Networks (RNNs) are proper approaches to capture context and dependencies in sequential data, while Convolutional Neural Networks (CNNs) excel at finding hidden patterns using local spatial information in the input. [4]. RNNs, with their recurrent connections, are well-suited for handling sequential data, such as text, because they can retain information from previous time steps and utilize it to make predictions.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Attention-Based Convolutional Recurrent Neural Networks for Scene Text Recognition

Alshawi,

Tanha,

Balafar

2024

IEEE Access

View full text Add to dashboard Cite

Text recognition is critical in various domains, including driving assistance, handwriting recognition, and aiding the visually impaired. In recent years, deep learning-based methods have demonstrated outstanding performance in Scene Text Recognition (STR). However, STR poses significant challenges, and the scarcity of non-Latin language datasets further compounds these challenges. To address this, we collected a dataset of Persian digits, including 20000 images with different challenges, making the dataset appropriate for text recognition task. Furthermore, we propose a Convolutional-based model that incorporates the squeeze and excitation gate, forcing the model to focus on latent features, and connectionist temporal classification, enabling end-to-end sequence learning, for Persian digit recognition. We conduct extensive comparisons with different architectures and models to evaluate the performance of our proposed model. As a result, our approach achieves an accuracy of 94.26 on our datasets. The results demonstrate that our model outperforms the other methods, highlighting its effectiveness in Persian digit recognition.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

An Attention-Based Convolutional Recurrent Neural Networks for Scene Text Recognition

Alshawi,

Tanha,

Balafar

2024

IEEE Access

View full text Add to dashboard Cite

show abstract

Orthogonality-constrained multihead self-attention for scene text recognition

Xu,

Zhu

2023

Journal of Image and Graphics

View full text Add to dashboard Cite

： Objective Scene text recognition (STR) is a hot research field in computer vision that aims to recognize text information from natural scenes. STR is important in many tasks and applications， such as image search， robot navigation， license plate recognition， and automatic driving. Most of the early STR models usually comprise a rectification network and a recognition network， while recent STR models usually comprise a convolutional neural network (CNN)-based feature encoder and a Transformer-based decoder or a customized CNN module and Transformer encoder-decoder. These STR mod• els usually have a complex model architecture， large computational load， and large memory consumption. A vision Trans• former (ViT)-based STR model called ViTSTR maintains balance among accuracy， speed， and computational load. How•

show abstract

Improving OCR Accuracy for Kazakh Handwriting Recognition Using GAN Models

2023

View full text Add to dashboard Cite

This paper aims to increase the accuracy of Kazakh handwriting text recognition (KHTR) using the generative adversarial network (GAN), where a handwriting word image generator and an image quality discriminator are constructed. In order to obtain a high-quality image of handwritten text, the multiple losses are intended to encourage the generator to learn the structural properties of the texts. In this case, the quality discriminator is trained on the basis of the relativistic loss function. Based on the proposed structure, the resulting document images not only preserve texture details but also generate different writer styles, which provides better OCR performance in public databases. With a self-created dataset, images of different types of handwriting styles were obtained, which will be used when training the network. The proposed approach allows for a character error rate (CER) of 11.15% and a word error rate (WER) of 25.65%.

show abstract

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Cited by 3 publications

References 32 publications

An Attention-Based Convolutional Recurrent Neural Networks for Scene Text Recognition

An Attention-Based Convolutional Recurrent Neural Networks for Scene Text Recognition

Orthogonality-constrained multihead self-attention for scene text recognition

Improving OCR Accuracy for Kazakh Handwriting Recognition Using GAN Models

Contact Info

Product

Resources

About