Temporally-aware Convolutional Block Attention Module for Video Text Detection

Fujitake, Masato; Ge, Hongpeng

doi:10.1109/smc52423.2021.9658799

Cited by 6 publications

(1 citation statement)

References 54 publications

(79 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The aim of text recognition, also known as optical character recognition (OCR), is to convert the text in images into digital text sequences. Many studies have been conducted on this technology owing to its wide range of real-world applications, including reading license plates and handwritten text, analyzing documents such as receipts and invoices [23,58], and analyzing road signs in automated driving and natural scenes [14,16]. However, the various fonts, lighting variations, complex backgrounds, low-quality images, occlusion, and text deformation make text recognition challenging.…”

Section: Introductionmentioning

confidence: 99%

Video Sparse Transformer With Attention-Guided Memory for Video Object Detection

Fujitake

Sugimoto

2022

IEEE Access

View full text Add to dashboard Cite

Typical text recognition methods rely on an encoderdecoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features. In this study, we propose a simpler and more effective method for text recognition, known as the Decoder-only Transformer for Optical Character Recognition (DTrOCR). This method uses a decoder-only Transformer to take advantage of a generative language model that is pre-trained on a large corpus. We examined whether a generative language model that has been successful in natural language processing can also be effective for text recognition in computer vision. Our experiments demonstrated that DTrOCR outperforms current state-of-the-art methods by a large margin in the recognition of printed, handwritten, and scene text in both English and Chinese.

show abstract