2020
DOI: 10.1007/978-3-030-58610-2_40
|View full text |Cite
|
Sign up to set email alerts
|

Shuffle and Attend: Video Domain Adaptation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
94
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(96 citation statements)
references
References 48 publications
2
94
0
Order By: Relevance
“…• We conduct extensive experiments on several challenging benchmarks (UCF-HMDB [9], Jester [57], and Epic-Kitchens [54]) for video domain adaptation to demonstrate the superiority of our approach over state-of-the-art methods. Our experiments show that CoMix delivers a significant performance increase over the compared methods, e.g., CoMix outperforms SAVA [12] (ECCV'20) by 3.6% on UCF-HMDB [9] and TA…”
Section: Introductionmentioning
confidence: 89%
See 2 more Smart Citations
“…• We conduct extensive experiments on several challenging benchmarks (UCF-HMDB [9], Jester [57], and Epic-Kitchens [54]) for video domain adaptation to demonstrate the superiority of our approach over state-of-the-art methods. Our experiments show that CoMix delivers a significant performance increase over the compared methods, e.g., CoMix outperforms SAVA [12] (ECCV'20) by 3.6% on UCF-HMDB [9] and TA…”
Section: Introductionmentioning
confidence: 89%
“…More recently, very few works have attempted deep UDA for video action recognition by directly matching segment-level features [9,28,54,45] or with attention weights [12,57]. However, (1) trivially matching segment-level feature distributions by extending the image-specific approaches, without considering the rich temporal information may not alone be sufficient for video domain adaptation; (2) prior methods often focus on aligning target features with source, rather than exploiting any action semantics shared across both domains (e.g., difference in background with the same action: videos in the top row of Figure 1 are from the source and target domain respectively, but both capture the same action walking); (3) existing methods often rely on complex adversarial learning which is unwieldy to train, resulting in very fragile convergence.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Misra et al [40] introduce the idea of learning such visual representations by estimating the order of shuffled video frames. Inspired by the success of this approach, several recent papers focused on designing a novel pretext task using temporal information, such as predicting future frames [13,49,54] or their embeddings [21,27]; estimating the order of frames [10,20,36,40,57] or the direction of video [56]. Another line of research focuses on using temporal coherence [6,24,26,41,62,63] as supervision signal.…”
Section: Related Workmentioning
confidence: 99%
“…Domain Adaptation for Videos. Prior works for video domain adaptation (DA) have focused on classification [6,11,28,42], segmentation [7,8] and localisation [2]. They use adversarial training to align the marginal distributions [28], an auxiliary self-supervised task [8,11,42], or attending to relevant frames alignment [6][7][8].…”
Section: Related Workmentioning
confidence: 99%