Siyang Qin scite author profile

We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape. We formulate arbitrary shape text detection as an instance segmentation problem; an attention model is then used to decode the textual content of each irregularly shaped text region without rectification. To extract useful irregularly shaped text instance features from image scale features, we propose a simple yet effective RoI masking step. Additionally, we show that predictions from an existing multi-step OCR engine can be leveraged as partially labeled training data, which leads to significant improvements in both the detection and recognition accuracy of our model. Our method surpasses the state-of-the-art for end-to-end recognition tasks on the ICDAR15 (straight) benchmark by 4.6%, and on the Total-Text (curved) benchmark by more than 16%.

show abstract

Automatic skin and hair masking using fully convolutional networks

Qin

Kim

Manduchi

2017

View full text Add to dashboard Cite

Selfies have become commonplace. More and more people take pictures of themselves, and enjoy enhancing these pictures using a variety of image processing techniques. One specific functionality of interest is automatic skin and hair segmentation, as this allows for processing one's skin and hair separately. Traditional approaches require user input in the form of fully specified trimaps, or at least of "scribbles" indicating foreground and background areas, with high-quality masks then generated via matting. Manual input, however, can be difficult or tedious, especially on a smartphone's small screen. In this paper, we propose the use of fully convolutional networks (FCN) and fully-connected CRF to perform pixel-level semantic segmentation into skin, hair and background. The trimap thus generated is given as input to a standard matting algorithm, resulting in accurate skin and hair alpha masks. Our method achieves state-of-the-art performance on the LFW Parts dataset [1]. The effectiveness of our method is also demonstrated with a specific application case.

show abstract

Cascaded Segmentation-Detection Networks for Word-Level Text Spotting

Qin

Manduchi

2017

View full text Add to dashboard Cite

We introduce an algorithm for word-level text spotting that is able to accurately and reliably determine the bounding regions of individual words of text “in the wild”. Our system is formed by the cascade of two convolutional neural networks. The first network is fully convolutional and is in charge of detecting areas containing text. This results in a very reliable but possibly inaccurate segmentation of the input image. The second network (inspired by the popular YOLO architecture) analyzes each segment produced in the first stage, and predicts oriented rectangular regions containing individual words. No post-processing (e.g. text line grouping) is necessary. With execution time of 450 ms for a 1000 × 560 image on a Titan X GPU, our system achieves good performance on the ICDAR 2013, 2015 benchmarks [2], [1].

show abstract

Applying Big Data Analytics to Monitor Tourist Flow for the Scenic Area Operation Management

Qin

Man

Wang

et al. 2019

Discrete Dynamics in Nature and Society

View full text Add to dashboard Cite

Considering the rapid development of the tourist leisure industry and the surge of tourist quantity, insufficient information regarding tourists has placed tremendous pressure on traffic in scenic areas. In this paper, the author uses the Big Data technology and Call Detail Record (CDR) data with the mobile phone real-time location information to monitor the tourist flow and analyse the travel behaviour of tourists in scenic areas. By collecting CDR data and implementing a modelling analysis of the data to simultaneously reflect the distribution of tourist hot spots in Beijing, tourist locations, tourist origins, tourist movements, resident information, and other data, the results provide big data support for alleviating traffic pressure at tourist attractions and tourist routes in the city and rationally allocating traffic resources. The analysis shows that the big data analysis method based on the CDR data of mobile phones can provide real-time information about tourist behaviours in a timely and effective manner. This information can be applied for the operation management of scenic areas and can provide real-time big data support for "smart tourism".Hindawi

show abstract

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Qin

Panteleev

Bissacco

et al. 2022

View full text Add to dashboard Cite

We organize a competition on hierarchical text detection and recognition. The competition is aimed to promote research into deep learning models and systems that can jointly perform text detection and recognition and geometric layout analysis. We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule. During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 20 teams were made in the 2 proposed tasks. Considering the number of teams and submissions, we conclude that the HierText competition has been successfully held. In this report, we will also present the competition results and insights from them.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Siyang Qin

Towards Unconstrained End-to-End Text Spotting

Automatic skin and hair masking using fully convolutional networks

Cascaded Segmentation-Detection Networks for Word-Level Text Spotting

Applying Big Data Analytics to Monitor Tourist Flow for the Scenic Area Operation Management

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Contact Info

Product

Resources

About