Jian Yao scite author profile

Abstract. In this paper, we propose a robust multi-layer background subtraction technique which takes advantages of local texture features represented by local binary patterns (LBP) and photometric invariant color measurements in RGB color space. LBP can work robustly with respective to light variation on rich texture regions but not so efficiently on uniform regions. In the latter case, color information should overcome LBP's limitation. Due to the illumination invariance of both the LBP feature and the selected color feature, the method is able to handle local illumination changes such as cast shadows from moving objects. Due to the use of a simple layerbased strategy, the approach can model moving background pixels with quasi-periodic flickering as well as background scenes which may vary over time due to the addition and removal of long-time stationary objects. Finally, the use of a cross-bilateral filter allows to implicitly smooth detection results over regions of similar intensity and preserve object boundaries. Numerical and qualitative experimental results on both simulated and real data demonstrate the robustness of the proposed method.

show abstract

Text-Attentional Convolutional Neural Network for Scene Text Detection

Huang

Qiao

et al. 2016

IEEE Trans. on Image Process.

297

104

View full text Add to dashboard Cite

Abstract-Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this work, we present a new system for scene text detection by proposing a novel Text-Attentional Convolutional Neural Network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/nontext information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates main task of text/non-text classification. In addition, a powerful low-level detector called ContrastEnhancement Maximally Stable Extremal Regions (CE-MSERs) is developed, which extends the widely-used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 dataset, with a F-measure of 0.82, improving the state-of-the-art results substantially.

show abstract

Real-time coarse-to-fine topologically preserving segmentation

et al. 2015

View full text Add to dashboard Cite

In this paper, we tackle the problem of unsupervised segmentation in the form of superpixels. Our main emphasis is on speed and accuracy. We build on [31] to define the problem as a boundary and topology preserving Markov random field. We propose a coarse to fine optimization technique that speeds up inference in terms of the number of updates by an order of magnitude. Our approach is shown to outperform [31] while employing a single iteration. We evaluate and compare our approach to state-of-the-art superpixel algorithms on the BSD and KITTI benchmarks. Our approach significantly outperforms the baselines in the segmentation metrics and achieves the lowest error on the stereo task.

show abstract

RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes From High-Resolution Remotely Sensed Images

Liu

Yao

et al. 2019

IEEE Trans. Geosci. Remote Sensing

141

View full text Add to dashboard Cite

Line-based Multi-Label Energy Optimization for fisheye image rectification and calibration

Zhang

Yao

Mao

et al. 2015

View full text Add to dashboard Cite

Optimal seamline detection for multiple image mosaicking via graph cuts

Yao

et al. 2016

ISPRS Journal of Photogrammetry and Remote Sensing

View full text Add to dashboard Cite

Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present

Chen

Ma²,

Jiang

et al. 2018

View full text Add to dashboard Cite

Recently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on. In this paper, we propose a novel architecture, namely Auto-Reconstructor Network (ARNet), which, coupling with the conventional encoder-decoder framework, works in an end-to-end fashion to generate captions. AR-Net aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs). Extensive experimental results show that our proposed ARNet boosts the performance over the existing encoder-decoder models on both image captioning and source code captioning tasks. Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation. Furthermore, the performance on permuted sequential MNIST demonstrates that ARNet can effectively regularize RNN, especially on modeling long-term dependencies. Our code is available at: https://github.com/ chenxinpeng/ARNet. Recent work on caption generation, such as image cap-

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jian Yao

DeepCrack: A deep hierarchical feature learning architecture for crack segmentation

Multi-Layer Background Subtraction Based on Color and Texture

Text-Attentional Convolutional Neural Network for Scene Text Detection

Real-time coarse-to-fine topologically preserving segmentation

RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes From High-Resolution Remotely Sensed Images

Line-based Multi-Label Energy Optimization for fisheye image rectification and calibration

Optimal seamline detection for multiple image mosaicking via graph cuts

Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present

Contact Info

Product

Resources

About