Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413502
|View full text |Cite
|
Sign up to set email alerts
|

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

Abstract: Figure 1: Concept: depth information helps to model the relation between the moving person and the scene, thus guiding to learn richer context and less biased representation. We observe two important characteristics: 1) Apparently, scene information helps us to recognize actions. In this example, the mountain is critical to identifying the action as "riding mountain bike". 2) Even if the scene shifts from the mountain to the roadside, we can still recognize it correctly. The example is sampled from the Kinetic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 62 publications
(34 citation statements)
references
References 51 publications
(70 reference statements)
0
28
0
Order By: Relevance
“…Cao et al [6] focus on long-term temporal ordering information and propose a temporalalignment based method for few-shot action recognition. Fu et al [16] introduce depth information as extra visual information and propose a temporal asynchronization augmentation mechanism to augment source video representation. Besides, they propose a depth guided adaptive instance normalization module to fuse original RGB clips with non-strictly corresponding depth clips at the feature level.…”
Section: Few-shot Action Recognitionmentioning
confidence: 99%
See 2 more Smart Citations
“…Cao et al [6] focus on long-term temporal ordering information and propose a temporalalignment based method for few-shot action recognition. Fu et al [16] introduce depth information as extra visual information and propose a temporal asynchronization augmentation mechanism to augment source video representation. Besides, they propose a depth guided adaptive instance normalization module to fuse original RGB clips with non-strictly corresponding depth clips at the feature level.…”
Section: Few-shot Action Recognitionmentioning
confidence: 99%
“…Besides, the query set is sampled from the rest videos of the M classes. Following [16], in the training phase and testing phase, a query set has one video in each episode.…”
Section: Model Formulation 31 Architecture Overviewmentioning
confidence: 99%
See 1 more Smart Citation
“…Different modules of fusing instance appearance information and action structure information can be applied in this node, such as bilinear pooling [24,41], attention mechanisms [21,39,43], and other approaches [11,34]. For simplicity, we use a concatenation operation following with fully connected layers as the fusion module.…”
Section: Appearance Bias In Compositional Actionmentioning
confidence: 99%
“…With extensive efforts devoted to few-shot learning (FSL) recently, great success has been achieved in few-shot image classification [12,34,37,40,53]. Spurred by that, attempts are being made to extend FSL to action recognition domain [3,9,23,27,48,51,52]. Most methods follow the established practice of meta-learning paradigm [40], where models are trained with randomly sampled support and query sets episodically.…”
Section: Introductionmentioning
confidence: 99%