Object detection and recognition are the most important and challenging problems in computer vision.The remarkable advancements in deep learning techniques have signi cantly accelerated the momentum of object detection/recognition in recent years. Meanwhile, scene text detection/recognition is also a critical task in computer vision and has gotten more attention from many researchers due to its wide range of applications. This work focuses on detecting and recognizing multiple retail products stacked on the shelves and off the shelves in the grocery stores by identifying the label texts. In this paper, we proposed a new framework is composed of three modules: (a) Retail product detection, (b) Product-text detection (c) Product-text recognition. In the rst module, on-the-shelf and off-shelf retail products are detected using the YOLOv5 object detection algorithm. In the second module, we improve the performance of the state-of-the-art text detection algorithm named, "TextSnake", by replacing the backbone network (ResNet50 + FPN) and a post-processing technique, WHBBR (Width Height based Bounding Box Reconstruction), is proposed to detect regular and irregular text. In the nal module, we used a text recognition network named "SCATTER" to recognize the retail product's text information. The YOLOv5 algorithm accurately detects both on-the-shelf and off-the-shelf grocery products from the video frames and the static images. The experimental results show that the proposed text reconstruction approach WHBBR improves the performance of the state-of-the-art techniques on both regular and irregular text. The enhanced text detection and incorporated text recognition methods greatly support our proposed framework to recognize the on-the-shelf retail products by extracting product information such as product name, brand name, price, expiring date, etc. The recognized text contexts around the retail products can be used as the identi er to distinguish the product.
Scene Text Recognition (STR) has become a popular and long-standing research problem in computer vision communities. Almost all the existing approaches mainly adopt the connectionist temporal classification (CTC) technique. However, these existing approaches are not much effective for irregular STR. In this research article, we introduced a new encoder-decoder framework to identify both regular and irregular natural scene text, which is developed based on the transformer framework. The proposed framework is divided into four main modules: Image Transformation, Visual Feature Extraction (VFE), Encoder and Decoder. Firstly, we employ a Thin Plate Spline (TPS) transformation in the image transformation module to normalize the original input image to reduce the burden of subsequent feature extraction. Secondly, in the VFE module, we use ResNet as the Convolutional Neural Network (CNN) backbone to retrieve text image features maps from the rectified word image. However, the VFE module generates one-dimensional feature maps that are not suitable for locating a multi-oriented text on twodimensional word images. We proposed 2D Positional Encoding (2DPE) to preserve the sequential information. Thirdly, the feature aggregation and feature transformation are carried out simultaneously in the encoder module. We replace the original scaled dot-product attention model as in the standard transformer framework with an Optimal Adaptive Threshold-based Self-Attention (OATSA) model to filter noisy information effectively and focus on the most contributive text regions. Finally, we introduce a new architectural level bi-directional decoding approach in the decoder module to generate a more accurate character sequence. Eventually, We evaluate the effectiveness and robustness of the proposed framework in both horizontal and arbitrary text recognition through extensive experiments on seven public benchmarks including IIIT5K-Words, SVT, ICDAR 2003, ICDAR 2013, ICDAR 2015 We also demonstrate that our proposed framework outperforms most of the existing approaches by a substantial margin.
Object detection and recognition are the most important and challenging problems in computer vision. The remarkable advancements in deep learning techniques have significantly accelerated the momentum of object detection/recognition in recent years. Meanwhile, scene text detection/recognition is also a critical task in computer vision and has gotten more attention from many researchers due to its wide range of applications. This work focuses on detecting and recognizing multiple retail products stacked on the shelves and off the shelves in the grocery stores by identifying the label texts. In this paper, we proposed a new framework is composed of three modules: (a) Retail product detection, (b) Product-text detection (c) Product-text recognition. In the first module, on-the-shelf and off-shelf retail products are detected using the YOLOv5 object detection algorithm. In the second module, we improve the performance of the state-of-the-art text detection algorithm named, “TextSnake”, by replacing the backbone network (ResNet50 + FPN) and a post-processing technique, WHBBR (Width Height based Bounding Box Reconstruction), is proposed to detect regular and irregular text. In the final module, we used a text recognition network named “SCATTER” to recognize the retail product's text information. The YOLOv5 algorithm accurately detects both on-the-shelf and off-the-shelf grocery products from the video frames and the static images. The experimental results show that the proposed text reconstruction approach WHBBR improves the performance of the state-of-the-art techniques on both regular and irregular text. The enhanced text detection and incorporated text recognition methods greatly support our proposed framework to recognize the on-the-shelf retail products by extracting product information such as product name, brand name, price, expiring date, etc. The recognized text contexts around the retail products can be used as the identifier to distinguish the product.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.