Shijian Lu scite author profile

Automated recognition of texts in scenes has been a research challenge for years, largely due to the arbitrary variation of text appearances in perspective distortion, text line curvature, text styles and different types of imaging artifacts. The recent deep networks are capable of learning robust representations with respect to imaging artifacts and text style changes, but still face various problems while dealing with scene texts with perspective and curvature distortions. This paper presents an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better scene text recognition performance. An innovative rectification network is developed which employs a novel linefitting transformation to estimate the pose of text lines in scenes. In addition, an iterative rectification pipeline is developed where scene text distortions are corrected iteratively towards a fronto-parallel view. The ESIR is also robust to parameter initialization and the training needs only scene text images and word-level annotations as required by most scene text recognition systems. Extensive experiments over a number of public datasets show that the proposed ESIR is capable of rectifying scene text distortions accurately, achieving superior recognition performance for both normal scene text images and those suffering from perspective and curvature distortions.

show abstract

Blurred image region detection and classification

Tan

2011

160

144

View full text Add to dashboard Cite

Many digital images contain blurred regions which are caused by motion or defocus. Automatic detection and classification of blurred image regions are very important for different multimedia analyzing tasks. This paper presents a simple and effective automatic image blurred region detection and classification technique. In the proposed technique, blurred image regions are first detected by examining singular value information for each image pixels. The blur types (i.e. motion blur or defocus blur) are then determined based on certain alpha channel constraint that requires neither image deblurring nor blur kernel estimation. Extensive experiments have been conducted over a dataset that consists of 200 blurred image regions and 200 image regions with no blur that are extracted from 100 digital images. Experimental results show that the proposed technique detects and classifies the two types of image blurs accurately. The proposed technique can be used in many different multimedia analysis applications such as image segmentation, depth estimation and information retrieval.

show abstract

AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-Identification

Zhai

et al. 2020

260

140

View full text Add to dashboard Cite

Binarization of historical document images using the local maximum and minimum

Tan

2010

194

124

View full text Add to dashboard Cite

Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation

Zhu

Meng

Cai

et al. 2016

Journal of Visual Communication and Image Representation

200

118

View full text Add to dashboard Cite

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision. A lot of research has been conducted and has resulted in many applications. However, while many segmentation algorithms exist, yet there are only a few sparse and outdated summarizations available, an overview of the recent achievements and issues is lacking. We aim to provide a comprehensive review of the recent progress in this field. Covering 180 publications, we give an overview of broad areas of segmentation topics including not only the classic bottom-up approaches, but also the recent development in superpixel, interactive methods, object proposals, semantic image parsing and image cosegmentation. In addition, we also review the existing influential datasets and evaluation metrics. Finally, we suggest some design flavors and research directions for future research in image segmentation.

show abstract

Text Flow: A Unified Text Detection System in Natural Scene Images

et al. 2015

View full text Add to dashboard Cite

The prevalent scene text detection approach follows four sequential steps comprising character candidate detection, false character candidate removal, text line extraction, and text line verification. However, errors occur and accumulate throughout each of these sequential steps which often lead to low detection performance. To address these issues, we propose a unified scene text detection system, namely Text Flow, by utilizing the minimum cost (min-cost) flow network model. With character candidates detected by cascade boosting, the min-cost flow network model integrates the last three sequential steps into a single process which solves the error accumulation problem at both character level and text line level effectively. The proposed technique has been tested on three public datasets, i.e, ICDAR2011 dataset, IC-DAR2013 dataset and a multilingual dataset and it outperforms the state-of-the-art methods on all three datasets with much higher recall and F-score. The good performance on the multilingual dataset shows that the proposed technique can be used for the detection of texts in different languages.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shijian Lu

ICDAR 2015 competition on Robust Reading

Suppressing Uncertainties for Large-Scale Facial Expression Recognition

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

Blurred image region detection and classification

AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-Identification

Binarization of historical document images using the local maximum and minimum

Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation

Text Flow: A Unified Text Detection System in Natural Scene Images

Contact Info

Product

Resources

About