We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape. We formulate arbitrary shape text detection as an instance segmentation problem; an attention model is then used to decode the textual content of each irregularly shaped text region without rectification. To extract useful irregularly shaped text instance features from image scale features, we propose a simple yet effective RoI masking step. Additionally, we show that predictions from an existing multi-step OCR engine can be leveraged as partially labeled training data, which leads to significant improvements in both the detection and recognition accuracy of our model. Our method surpasses the state-of-the-art for end-to-end recognition tasks on the ICDAR15 (straight) benchmark by 4.6%, and on the Total-Text (curved) benchmark by more than 16%.
Selfies have become commonplace. More and more people take pictures of themselves, and enjoy enhancing these pictures using a variety of image processing techniques. One specific functionality of interest is automatic skin and hair segmentation, as this allows for processing one's skin and hair separately. Traditional approaches require user input in the form of fully specified trimaps, or at least of "scribbles" indicating foreground and background areas, with high-quality masks then generated via matting. Manual input, however, can be difficult or tedious, especially on a smartphone's small screen. In this paper, we propose the use of fully convolutional networks (FCN) and fully-connected CRF to perform pixel-level semantic segmentation into skin, hair and background. The trimap thus generated is given as input to a standard matting algorithm, resulting in accurate skin and hair alpha masks. Our method achieves state-of-the-art performance on the LFW Parts dataset [1]. The effectiveness of our method is also demonstrated with a specific application case.
We introduce an algorithm for word-level text spotting that is able to accurately and reliably determine the bounding regions of individual words of text “in the wild”. Our system is formed by the cascade of two convolutional neural networks. The first network is fully convolutional and is in charge of detecting areas containing text. This results in a very reliable but possibly inaccurate segmentation of the input image. The second network (inspired by the popular YOLO architecture) analyzes each segment produced in the first stage, and predicts oriented rectangular regions containing individual words. No post-processing (e.g. text line grouping) is necessary. With execution time of 450 ms for a 1000 × 560 image on a Titan X GPU, our system achieves good performance on the ICDAR 2013, 2015 benchmarks [2], [1].
Considering the rapid development of the tourist leisure industry and the surge of tourist quantity, insufficient information regarding tourists has placed tremendous pressure on traffic in scenic areas. In this paper, the author uses the Big Data technology and Call Detail Record (CDR) data with the mobile phone real-time location information to monitor the tourist flow and analyse the travel behaviour of tourists in scenic areas. By collecting CDR data and implementing a modelling analysis of the data to simultaneously reflect the distribution of tourist hot spots in Beijing, tourist locations, tourist origins, tourist movements, resident information, and other data, the results provide big data support for alleviating traffic pressure at tourist attractions and tourist routes in the city and rationally allocating traffic resources. The analysis shows that the big data analysis method based on the CDR data of mobile phones can provide real-time information about tourist behaviours in a timely and effective manner. This information can be applied for the operation management of scenic areas and can provide real-time big data support for "smart tourism".Hindawi
We organize a competition on hierarchical text detection and recognition. The competition is aimed to promote research into deep learning models and systems that can jointly perform text detection and recognition and geometric layout analysis. We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule. During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 20 teams were made in the 2 proposed tasks. Considering the number of teams and submissions, we conclude that the HierText competition has been successfully held. In this report, we will also present the competition results and insights from them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.