Few-shot Action Recognition with Prototype-centered Attentive Learning

Zhu, Xiatian; Toisoul, Antoine; Pérez-Rúa, Juan-Manuel; Zhang, Li; Martínez, Brais; Xiang, Tao

doi:10.48550/arxiv.2101.08085

Cited by 6 publications

(13 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Almost all existing few-shot video classification methods [1,2,3,19,21,30,31] use pretrained weights (ImageNet or Sports-1M). However, this pre-training step might transfer the semantic knowledge learned from the ImageNet or Sports-1M to downstream few-shot learning.…”

Section: Discussion On Imagenet Pre-trainingmentioning

confidence: 99%

“…One promising direction is the meta-learning paradigm [5] where transferable knowledge is learned from a collection of tasks (or episodes) to prevent over-fitting and improve generalization. Inspired by metric learning methods [23,24], the existing few-shot video classification methods [1,2,3,19,21,31] usually compare the similarity of different videos in the feature space for classification. The essential difference between videos and images is the extra temporal dimension, which makes it insufficient to represent a whole video as a single feature vector.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark

Zhu¹,

Wang²,

Guo³

et al. 2021

Preprint

View full text Add to dashboard Cite

The existing few-shot video classification methods often employ a meta-learning paradigm by designing customized temporal alignment module for similarity calculation. While significant progress has been made, these methods fail to focus on learning effective representations, and heavily rely on the ImageNet pre-training, which might be unreasonable for the few-shot recognition setting due to semantics overlap. In this paper, we aim to present an in-depth study on few-shot video classification by making three contributions. First, we perform a consistent comparative study on the existing metricbased methods to figure out their limitations in representation learning. Accordingly, we propose a simple classifier-based baseline without any temporal alignment that surprisingly outperforms the state-of-the-art meta-learning based methods. Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting. Our results show that the performance of training from scratch drops significantly, which implies that the existing benchmarks cannot provide enough base data. Finally, we present a new benchmark with more base data to facilitate future few-shot video classification without pre-training. The code will be made available at https://github.com/MCG-NJU/FSL-Video.

show abstract

Section: Discussion On Imagenet Pre-trainingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark

Zhu¹,

Wang²,

Guo³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Most state-of-the-arts adopt the episodic meta-learning strategy, which trains the model with randomly sampled support and query sets in each episode. Spurred by the success of FSL in image classification, attempts are being made to explore FSL in action recognition [3,9,23,27,48,49,51,52]. CMN [51] utilizes memory networks to compress video information into a fixed matrix, which facilitates few-shot recognition by feature matching.…”

Section: Related Workmentioning

confidence: 99%

“…As shown in Figure 4(a), most state-of-the-art few-shot action recognition methods [9,23,52] utilize the prototypical network (Pro-toNet) [34] for meta-learning. In each training episode, representations of support samples are utilized to build a classifier.…”

Section: Contrastive Meta-learningmentioning

confidence: 99%

“…With extensive efforts devoted to few-shot learning (FSL) recently, great success has been achieved in few-shot image classification [12,34,37,40,53]. Spurred by that, attempts are being made to extend FSL to action recognition domain [3,9,23,27,48,51,52]. Most methods follow the established practice of meta-learning paradigm [40], where models are trained with randomly sampled support and query sets episodically.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Few-shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning

Wang

Liu

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited. Therefore, we propose the few-shot finegrained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class. Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling finegrained actions: the inability to capture subtle action details and the inadequacy in learning from data with low inter-class variance. To tackle the first issue, a human vision inspired bidirectional attention module (BAM) is proposed. Combining top-down task-driven signals with bottom-up salient stimuli, BAM captures subtle action details by accurately highlighting informative spatio-temporal regions. To address the second issue, we introduce contrastive metalearning (CML). Compared with the widely adopted ProtoNet-based method, CML generates more discriminative video representations for low inter-class variance data, since it makes full use of potential contrastive pairs in each training episode. Furthermore, to fairly compare different models, we establish specific benchmark protocols on two large-scale fine-grained action recognition datasets. Extensive experiments show that our method consistently achieves state-of-the-art performance across evaluated tasks. CCS CONCEPTS• Computing methodologies → Activity recognition and understanding.

show abstract