Attention‐Based Temporal Encoding Network with Background‐Independent Motion Mask for Action Recognition

Weng, Zhengkui; Zhang, Jin; Chen, Shuangxi; Shen, Quanquan; Ren, Xiangyang; Li, Wuzhao

doi:10.1155/2021/8890808

Cited by 1 publication

(1 citation statement)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tu et al [ 55 ] proposed a combination of video object detection and motion saliency detection methods, which are based on pre-trained models from other datasets with extra labels to form a multi-stream neural network for action recognition. Weng et al [ 56 ] utilized boundaries and optical flow to generate background-independent motion masks for action recognition.…”

Section: Related Workmentioning

confidence: 99%

ASNet: Auto-Augmented Siamese Neural Network for Action Recognition

Zhang

Xiong

et al. 2021

Sensors

View full text Add to dashboard Cite

Human action recognition methods in videos based on deep convolutional neural networks usually use random cropping or its variants for data augmentation. However, this traditional data augmentation approach may generate many non-informative samples (video patches covering only a small part of the foreground or only the background) that are not related to a specific action. These samples can be regarded as noisy samples with incorrect labels, which reduces the overall action recognition performance. In this paper, we attempt to mitigate the impact of noisy samples by proposing an Auto-augmented Siamese Neural Network (ASNet). In this framework, we propose backpropagating salient patches and randomly cropped samples in the same iteration to perform gradient compensation to alleviate the adverse gradient effects of non-informative samples. Salient patches refer to the samples containing critical information for human action recognition. The generation of salient patches is formulated as a Markov decision process, and a reinforcement learning agent called SPA (Salient Patch Agent) is introduced to extract patches in a weakly supervised manner without extra labels. Extensive experiments were conducted on two well-known datasets UCF-101 and HMDB-51 to verify the effectiveness of the proposed SPA and ASNet.

show abstract

Section: Related Workmentioning

confidence: 99%