2021
DOI: 10.48550/arxiv.2101.08085
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Few-shot Action Recognition with Prototype-centered Attentive Learning

Abstract: Few-shot action recognition aims to recognize action classes with few training samples. Most existing methods adopt a meta-learning approach with episodic training. In each episode, the few samples in a meta-training task are split into support and query sets. The former is used to build a classifier, which is then evaluated on the latter using a query-centered loss for model updating. There are however two major limitations: lack of data efficiency due to the query-centered only loss design and inability to d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(13 citation statements)
references
References 24 publications
0
13
0
Order By: Relevance
“…Almost all existing few-shot video classification methods [1,2,3,19,21,30,31] use pretrained weights (ImageNet or Sports-1M). However, this pre-training step might transfer the semantic knowledge learned from the ImageNet or Sports-1M to downstream few-shot learning.…”
Section: Discussion On Imagenet Pre-trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…Almost all existing few-shot video classification methods [1,2,3,19,21,30,31] use pretrained weights (ImageNet or Sports-1M). However, this pre-training step might transfer the semantic knowledge learned from the ImageNet or Sports-1M to downstream few-shot learning.…”
Section: Discussion On Imagenet Pre-trainingmentioning
confidence: 99%
“…One promising direction is the meta-learning paradigm [5] where transferable knowledge is learned from a collection of tasks (or episodes) to prevent over-fitting and improve generalization. Inspired by metric learning methods [23,24], the existing few-shot video classification methods [1,2,3,19,21,31] usually compare the similarity of different videos in the feature space for classification. The essential difference between videos and images is the extra temporal dimension, which makes it insufficient to represent a whole video as a single feature vector.…”
Section: Introductionmentioning
confidence: 99%
“…Most state-of-the-arts adopt the episodic meta-learning strategy, which trains the model with randomly sampled support and query sets in each episode. Spurred by the success of FSL in image classification, attempts are being made to explore FSL in action recognition [3,9,23,27,48,49,51,52]. CMN [51] utilizes memory networks to compress video information into a fixed matrix, which facilitates few-shot recognition by feature matching.…”
Section: Related Workmentioning
confidence: 99%
“…As shown in Figure 4(a), most state-of-the-art few-shot action recognition methods [9,23,52] utilize the prototypical network (Pro-toNet) [34] for meta-learning. In each training episode, representations of support samples are utilized to build a classifier.…”
Section: Contrastive Meta-learningmentioning
confidence: 99%
See 1 more Smart Citation