Xiaokang Yang scite author profile

In this paper, the problem of multi-target tracking with single camera in complex scenes is addressed. A new approach is proposed for multi-target tracking problem that learns from hierarchy of convolution features. First fast Region-based Convolutional Neutral Networks is trained to detect pedestrian in each frame. Then cooperate it with correlation filter tracker which learns target's appearance from pretrained convolutional neural networks. Correlation filter learns from middle and last convolutional layers to enhances targets localization. However correlation filters fail in case of targets full occlusion. This lead to separated tracklets (mini-trajectories) problem. So a post processing step is added to link separated tracklets with minimum-cost network flow. A cost function is used, that depends on motion cues in associating short tracklets. Experimental results on MOT2015 benchmark show that the proposed approach produce comparable result against state-of-the-art approaches. It shows an increase 4.5 % in multiple object tracking accuracy. Also mostly tracked targets is 12.9% vs 7.5% against state-of-the-art minimum-cost network flow tracker.

show abstract

Long-term correlation tracking

Yang

Zhang

et al. 2015

844

726

View full text Add to dashboard Cite

Learning a no-reference quality metric for single-image super-resolution

Yang

et al. 2017

Computer Vision and Image Understanding

439

263

View full text Add to dashboard Cite

Numerous single-image super-resolution algorithms have been proposed in the literature, but few studies address the problem of performance evaluation based on visual perception. While most super-resolution images are evaluated by fullreference metrics, the effectiveness is not clear and the required ground-truth images are not always available in practice. To address these problems, we conduct human subject studies using a large set of super-resolution images and propose a no-reference metric learned from visual perceptual scores. Specifically, we design three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learn a two-stage regression model to predict the quality scores of super-resolution images without referring to groundtruth images. Extensive experimental results show that the proposed metric is effective and efficient to assess the quality of super-resolution images based on human perception.

show abstract

Just noticeable distortion model and its applications in video coding

Yang

Wang

et al. 2005

Signal Processing: Image Communication

276

246

View full text Add to dashboard Cite

Using Free Energy Principle For Blind Image Quality Assessment

Zhai

Yang

et al. 2015

IEEE Trans. Multimedia

553

231

View full text Add to dashboard Cite

Crowd Counting via Adversarial Cross-Scale Consistency Pursuit

Shen

et al. 2018

303

196

View full text Add to dashboard Cite

Learning Combinatorial Embedding Networks for Deep Graph Matching

2019

View full text Add to dashboard Cite

Graph matching refers to finding node correspondence between graphs, such that the corresponding node and edge's affinity can be maximized. In addition with its NPcompleteness nature, another important challenge is effective modeling of the node-wise and structure-wise affinity across graphs and the resulting objective, to guide the matching procedure effectively finding the true matching against noises. To this end, this paper devises an end-toend differentiable deep network pipeline to learn the affinity for graph matching. It involves a supervised permutation loss regarding with node correspondence to capture the combinatorial nature for graph matching. Meanwhile deep graph embedding models are adopted to parameterize both intra-graph and cross-graph affinity functions, instead of the traditional shallow and simple parametric forms e.g. a Gaussian kernel. The embedding can also effectively capture the higher-order structure beyond second-order edges. The permutation loss model is agnostic to the number of nodes, and the embedding model is shared among nodes such that the network allows for varying numbers of nodes in graphs for training and inference. Moreover, our network is class-agnostic with some generalization capability across different categories. All these features are welcomed for real-world applications. Experiments show its superiority against state-of-the-art graph matching learning methods.

show abstract

Person Re-identification via Recurrent Feature Aggregation

et al. 2016

View full text Add to dashboard Cite

Abstract. We address the person re-identification problem by effectively exploiting a globally discriminative feature representation from a sequence of tracked human regions/patches. This is in contrast to previous person re-id works, which rely on either single frame based person to person patch matching, or graph based sequence to sequence matching. We show that a progressive/sequential fusion framework based on long short term memory (LSTM) network aggregates the frame-wise human region representation at each time stamp and yields a sequence level human feature representation. Since LSTM nodes can remember and propagate previously accumulated good features and forget newly input inferior ones, even with simple hand-crafted features, the proposed recurrent feature aggregation network (RFA-Net) is effective in generating highly discriminative sequence level human representations. Extensive experimental results on two person re-identification benchmarks demonstrate that the proposed method performs favorably against state-of-the-art person re-identification methods. Our code is available at https://sites.google.com/site/yanyichao91sjtu/

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaokang Yang

Hierarchical Convolutional Features for Visual Tracking

Long-term correlation tracking

Learning a no-reference quality metric for single-image super-resolution

Just noticeable distortion model and its applications in video coding

Using Free Energy Principle For Blind Image Quality Assessment

Crowd Counting via Adversarial Cross-Scale Consistency Pursuit

Learning Combinatorial Embedding Networks for Deep Graph Matching

Person Re-identification via Recurrent Feature Aggregation

Contact Info

Product

Resources

About