Qiaoyong Zhong scite author profile

Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature learning framework. The co-occurrence features are learned with a hierarchical methodology, in which different levels of contextual information are aggregated gradually. Firstly point-level information of each joint is encoded independently. Then they are assembled into semantic representation in both spatial and temporal domains. Specifically, we introduce a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation. Besides, raw skeleton coordinates as well as their temporal difference are integrated with a two-stream paradigm. Experiments show that our approach consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.

show abstract

Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition

Zhong

et al. 2020

206

111

View full text Add to dashboard Cite

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

Li¹,

Zhong²,

Xie³

et al. 2018

Preprint

View full text Add to dashboard Cite

Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study

Yang

Yuan

Ren

et al. 2020

IEEE Trans. on Image Process.

168

View full text Add to dashboard Cite

Existing enhancement methods are empirically expected to help the high-level end computer vision task: however, that is observed to not always be the case in practice. We focus on object or face detection in poor visibility enhancements caused by bad weathers (haze, rain) and low light conditions. To provide a more thorough examination and fair comparison, we introduce three benchmark sets collected in real-world hazy, rainy, and lowlight conditions, respectively, with annotated objects/faces. We launched the UG 2+ challenge Track 2 competition in IEEE CVPR 2019, aiming to evoke a comprehensive discussion and exploration about whether and how low-level vision techniques can benefit the high-level automatic visual recognition in various scenarios. To our best knowledge, this is the first and currently largest effort of its kind. Baseline results by cascading existing enhancement and detection models are reported, indicating the highly challenging nature of our new data as well as the large room for further technical innovations. Thanks to a large participation from the research community, we are able to analyze representative team solutions, striving to better identify the strengths and limitations of existing mindsets as well as the future directions.Index Terms-Poor visibility environment, object detection, face detection, haze, rain, low-light conditions *The first two authors Wenhan Yang and Ye Yuan contributed equally. Ye Yuan and Wenhan Yang helped prepare the dataset proposed for the UG2+ Challenges, and were the main responsible members for UG2+ Challenge 2019 (Track 2) platform setup and technical support. Wenqi Ren, Jiaying Liu, Walter J. Scheirer, and Zhangyang Wang were the main organizers of the challenge and helped prepare the dataset, raise sponsors, set up evaluation environment, and improve the technical submission. Other authors are the group members of winner teams in UG2+ challenge Track 2 contributing to the winning methods.

show abstract

Collaborative Spatiotemporal Feature Learning for Video Action Recognition

Zhong

Xie

et al. 2019

108

View full text Add to dashboard Cite

Spatiotemporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D). In this paper, we propose a novel neural operation which encodes spatiotemporal features collaboratively by imposing a weight-sharing constraint on the learnable parameters. In particular, we perform 2D convolution along three orthogonal views of volumetric video data, which learns spatial appearance and temporal motion cues respectively. By sharing the convolution kernels of different views, spatial and temporal features are collaboratively learned and thus benefit from each other. The complementary features are subsequently fused by a weighted summation whose coefficients are learned end-to-end. Our approach achieves state-of-the-art performance on largescale benchmarks and won the 1st place in the Moments in Time Challenge 2018. Moreover, based on the learned coefficients of different views, we are able to quantify the contributions of spatial and temporal features. This analysis sheds light on interpretability of the model and may also guide the future design of algorithm for video recognition.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Qiaoyong Zhong

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study

Collaborative Spatiotemporal Feature Learning for Video Action Recognition

Contact Info

Product

Resources

About