Ling-Yu Duan scite author profile

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http:// rose1.ntu.edu.sg/ Datasets/ actionRecognition.asp.]

show abstract

Global Context-Aware Attention LSTM Networks for 3D Action Recognition

Liu

Wang

et al. 2017

551

380

View full text Add to dashboard Cite

Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks

Liu

Wang

Duan³

et al. 2018

IEEE Trans. on Image Process.

467

232

View full text Add to dashboard Cite

Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, long short-term memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints are informative for action recognition, and the irrelevant joints often bring noise which can degrade the performance, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In this paper, we propose a new class of LSTM network, global context-aware attention LSTM, for skeleton-based action recognition, which is capable of selectively focusing on the informative joints in each frame by using a global context memory cell. To further improve the attention capability, we also introduce a recurrent attention mechanism, with which the attention performance of our network can be enhanced progressively. Besides, a two-stream framework, which leverages coarse-grained attention and fine-grained attention, is also introduced. The proposed method achieves state-of-the-art performance on five challenging datasets for skeleton-based action recognition.

show abstract

Exploring Object Relation in Mean Teacher for Cross-Domain Detection

et al. 2019

View full text Add to dashboard Cite

Rendering synthetic data (e.g., 3D CAD-rendered images) to generate annotations for learning deep models in vision tasks has attracted increasing attention in recent years. However, simply applying the models learnt on synthetic images may lead to high generalization error on real images due to domain shift. To address this issue, recent progress in cross-domain recognition has featured the Mean Teacher, which directly simulates unsupervised domain adaptation as semi-supervised learning. The domain gap is thus naturally bridged with consistency regularization in a teacher-student scheme. In this work, we advance this Mean Teacher paradigm to be applicable for crossdomain detection. Specifically, we present Mean Teacher with Object Relations (MTOR) that novelly remolds Mean Teacher under the backbone of Faster R-CNN by integrating the object relations into the measure of consistency cost between teacher and student modules. Technically, MTOR firstly learns relational graphs that capture similarities between pairs of regions for teacher and student respectively. The whole architecture is then optimized with three consistency regularizations: 1) region-level consistency to align the region-level predictions between teacher and student, 2) inter-graph consistency for matching the graph structures between teacher and student, and 3) intra-graph consistency to enhance the similarity between regions of same class within the graph of student. Extensive experiments are conducted on the transfers across Cityscapes, Foggy Cityscapes, and SIM10k, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, we obtain a new record of single model: 22.8% of mAP on Syn2Real detection dataset.

show abstract

Classes Matter: A Fine-Grained Adversarial Approach to Cross-Domain Semantic Segmentation

Wang

Shen²,

Zhang³

et al. 2020

186

200

View full text Add to dashboard Cite

VERI-Wild: A Large Dataset and a New Method for Vehicle Re-Identification in the Wild

et al. 2019

View full text Add to dashboard Cite

Group-Sensitive Triplet Embedding for Vehicle Reidentification

Bai

Lou

Gao

et al. 2018

IEEE Trans. Multimedia

236

145

View full text Add to dashboard Cite

Benchmarking Single-Image Reflection Removal Algorithms

Wan¹,

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ling-Yu Duan

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Global Context-Aware Attention LSTM Networks for 3D Action Recognition

Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks

Exploring Object Relation in Mean Teacher for Cross-Domain Detection

Classes Matter: A Fine-Grained Adversarial Approach to Cross-Domain Semantic Segmentation

VERI-Wild: A Large Dataset and a New Method for Vehicle Re-Identification in the Wild

Group-Sensitive Triplet Embedding for Vehicle Reidentification

Benchmarking Single-Image Reflection Removal Algorithms

Contact Info

Product

Resources

About