Haojun Jiang scite author profile

In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency. It is observed that the most informative region in each frame of a video is usually a small image patch, which shifts smoothly across frames. Therefore, we model the patch localization problem as a sequential decision task, and propose a reinforcement learning based approach for efficient spatially adaptive video recognition (AdaFocus). In specific, a light-weighted ConvNet is first adopted to quickly process the full video sequence, whose features are used by a recurrent policy network to localize the most task-relevant regions. Then the selected patches are inferred by a highcapacity network for the final prediction. During offline inference, once the informative patch sequence has been generated, the bulk of computation can be done in parallel, and is efficient on modern GPU devices. In addition, we demonstrate that the proposed method can be easily extended by further considering the temporal redundancy, e.g., dynamically skipping less valuable frames. Extensive experiments on five benchmark datasets, i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, demonstrate that our method is significantly more efficient than the competitive baselines. Code will be available at https:// github.com/blackfeather-wang/AdaFocus.

show abstract

Spatially Adaptive Feature Refinement for Efficient Inference

Han

Huang

Song

et al. 2021

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Adaptive Focus for Efficient Video Recognition

Wang¹,

Chen²,

Jiang³

et al. 2021

Preprint

View full text Add to dashboard Cite

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Wang

Yue

Lin

et al. 2022

View full text Add to dashboard Cite

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Yang

Jiang

Cai

et al. 2021

View full text Add to dashboard Cite

Glance and Focus Networks for Dynamic Visual Recognition

Huang

Wang

et al. 2022

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Wang¹,

Yang²,

Lin³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent works have shown that the computational efficiency of video recognition can be significantly improved by reducing the spatial redundancy. As a representative work, the adaptive focus method (AdaFocus) has achieved a favorable trade-off between accuracy and inference speed by dynamically identifying and attending to the informative regions in each video frame. However, AdaFocus requires a complicated three-stage training pipeline (involving reinforcement learning), leading to slow convergence and is unfriendly to practitioners. This work reformulates the training of AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation, enabling efficient end-to-end optimization. We further present an improved training scheme to address the issues introduced by the one-stage formulation, including the lack of supervision, input diversity and training stability. Moreover, a conditional-exit technique is proposed to perform temporal adaptive computation on top of AdaFocus without additional training. Extensive experiments on six benchmark datasets (i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, and Jester) demonstrate that our model significantly outperforms the original AdaFocus and other competitive baselines, while being considerably more simple and efficient to train. Code is available at https://github.com/ LeapLabTHU/AdaFocusV2.

show abstract

Detecting Coal Pulverizing System Anomaly Using a Gated Recurrent Unit and Clustering

Chen

Yan

Jiang

et al. 2020

Sensors

View full text Add to dashboard Cite

The coal pulverizing system is an important auxiliary system in thermal power generation systems. The working condition of a coal pulverizing system may directly affect the safety and economy of power generation. Prognostics and health management is an effective approach to ensure the reliability of coal pulverizing systems. As the coal pulverizing system is a typical dynamic and nonlinear high-dimensional system, it is difficult to construct accurate mathematical models used for anomaly detection. In this paper, a novel data-driven integrated framework for anomaly detection of the coal pulverizing system is proposed. A neural network model based on gated recurrent unit (GRU) networks, a type of recurrent neural network (RNN), is constructed to describe the temporal characteristics of high-dimensional data and predict the system condition value. Then, aiming at the prediction error, a novel unsupervised clustering algorithm for anomaly detection is proposed. The proposed framework is validated by a real case study from an industrial coal pulverizing system. The results show that the proposed framework can detect the anomaly successfully.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haojun Jiang

Adaptive Focus for Efficient Video Recognition

Spatially Adaptive Feature Refinement for Efficient Inference

Adaptive Focus for Efficient Video Recognition

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Glance and Focus Networks for Dynamic Visual Recognition

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Detecting Coal Pulverizing System Anomaly Using a Gated Recurrent Unit and Clustering

Contact Info

Product

Resources

About