Liyuan Xing scite author profile

Recent progress has been made on developing a unified framework for joint text detection and recognition in natural images, but existing joint models were mostly built on two-stage framework by involving ROI pooling, which can degrade the performance on recognition task. In this work, we propose convolutional character networks, referred as CharNet, which is an one-stage model that can process two tasks simultaneously in one pass. CharNet directly outputs bounding boxes of words and characters, with corresponding character labels. We utilize character as basic element, allowing us to overcome the main difficulty of existing approaches that attempted to optimize text detection jointly with a RNN-based recognition branch. In addition, we develop an iterative character detection approach able to transform the ability of character detection learned from synthetic data to real-world images. These technical improvements result in a simple, compact, yet powerful onestage model that works reliably on multi-orientation and curved text. We evaluate CharNet on three standard benchmarks, where it consistently outperforms the state-of-theart approaches [25,24] by a large margin, e.g., with improvements of 65.33%→71.08% (with generic lexicon) on ICDAR 2015, and 54.0%→69.23% on Total-Text, on endto-end text recognition. Code is available at: https:// github.com/MalongTech/research-charnet.

show abstract

DeepWriter: A Multi-stream Deep CNN for Text-Independent Writer Identification

Xing¹,

Qiao²

2016

View full text Add to dashboard Cite

Abstract-Text-independent writer identification is challenging due to the huge variation of written contents and the ambiguous written styles of different writers. This paper proposes DeepWriter, a deep multi-stream CNN to learn deep powerful representation for recognizing writers. DeepWriter takes local handwritten patches as input and is trained with softmax classification loss. The main contributions are: 1) we design and optimize multi-stream structure for writer identification task; 2) we introduce data augmentation learning to enhance the performance of DeepWriter; 3) we introduce a patch scanning strategy to handle text image with different lengths. In addition, we find that different languages such as English and Chinese may share common features for writer identification, and joint training can yield better performance. Experimental results on IAM and HWDB datasets show that our models achieve high identification accuracy: 99.01% on 301 writers and 97.03% on 657 writers with one English sentence input, 93.85% on 300 writers with one Chinese character input, which outperform previous methods with a large margin. Moreover, our models obtain accuracy of 98.01% on 301 writers with only 4 English alphabets as input.

show abstract

Assessment of Stereoscopic Crosstalk Perception

Xing

You

Ebrahimi

et al. 2012

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Player action recognition in broadcast tennis video with applications to semantic analysis of sports game

Zhu

Huang

et al. 2006

View full text Add to dashboard Cite

A framework for flexible summarization of racquet sports video using multiple modalities

Liu

Huang

Jiang

et al. 2009

Computer Vision and Image Understanding

View full text Add to dashboard Cite

Development of new method of δ13C measurement for trace hydrocarbons in natural gas using solid phase micro-extraction coupled to gas chromatography isotope ratio mass spectrometry

Wang

et al. 2014

Journal of Chromatography A

View full text Add to dashboard Cite

Atmospheric palaeo-CO2 estimates based on the carbon isotope and stomatal data of Cheirolepidiaceae from the Lower Cretaceous of the Jiuquan Basin, Gansu Province

Sun

Zhang

et al. 2016

Cretaceous Research

View full text Add to dashboard Cite

Unsupervised sports video scene clustering and its applications to story units detection

Zhang

Xing

et al. 2005

View full text Add to dashboard Cite

In this paper, we present a new and efficient clustering approach for scene analysis in sports video. This method is generic and does not require any prior domain knowledge. It performs in an unsupervised manner and relies on the scene likeness analysis of the shots in the video. The two most similar shots are merged into the same scene in each iteration. And this procedure is repeated until the merging stop criterion is satisfied. The stop criterion is defined based on a J value which is defined according to the Fisher Discriminant Function. We call this method J-based Scene Clustering. By using this method, the low-level video content representation shots could be clustered into the midlevel video content representation scenes, which are useful for high-level sports video content analysis such as playbreak parsing, story units detection, highlights extraction and summarization, etc. Experimental results obtained from various types of broadcast sports videos demonstrate the efficacy of the proposed approach. Moreover, in this paper, we also present a simple application of our scene clustering method to story units detection in periodic sports videos like archery video, diving video and so on. The experimental results are encouraging.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Liyuan Xing

Convolutional Character Networks

DeepWriter: A Multi-stream Deep CNN for Text-Independent Writer Identification

Assessment of Stereoscopic Crosstalk Perception

Player action recognition in broadcast tennis video with applications to semantic analysis of sports game

A framework for flexible summarization of racquet sports video using multiple modalities

Development of new method of δ13C measurement for trace hydrocarbons in natural gas using solid phase micro-extraction coupled to gas chromatography isotope ratio mass spectrometry

Atmospheric palaeo-CO2 estimates based on the carbon isotope and stomatal data of Cheirolepidiaceae from the Lower Cretaceous of the Jiuquan Basin, Gansu Province

Unsupervised sports video scene clustering and its applications to story units detection

Contact Info

Product

Resources

About