2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00966
|View full text |Cite
|
Sign up to set email alerts
|

Spatio-temporal Contrastive Domain Adaptation for Action Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
48
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 56 publications
(50 citation statements)
references
References 23 publications
2
48
0
Order By: Relevance
“…Here, we focus on video domain adaptation for activity recognition. State-of-the-art visual-only solutions learn to reduce the shift in activity appearance by adversarial training [5,6,8,9,20,27,29] and self-supervised learning techniques [9,22,27,34]. While Jamal et al [20] and Munro and Damen [27] directly penalize domain specific features with an adversarial loss at every time stamp, Chen et al [5], Choi et al [9] and Pan et al [29] attend to temporal segments that contain important cues.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Here, we focus on video domain adaptation for activity recognition. State-of-the-art visual-only solutions learn to reduce the shift in activity appearance by adversarial training [5,6,8,9,20,27,29] and self-supervised learning techniques [9,22,27,34]. While Jamal et al [20] and Munro and Damen [27] directly penalize domain specific features with an adversarial loss at every time stamp, Chen et al [5], Choi et al [9] and Pan et al [29] attend to temporal segments that contain important cues.…”
Section: Related Workmentioning
confidence: 99%
“…Self-supervised learning objectives are also incorporated in [27] and [9] to better align the features across domains by utilizing the correspondences between RGB and optical flow or the temporal order of video clips. Song et al [34] and Kim et al [22] obtain remarkable performance by contrastive learning for self-supervised learning to align the feature distributions between video domains. Instead of relying on the vision modality only, which may present large activity appearance variance, we consider the domain-invariant information within sound to help the model adapt to the visual distribution shift.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations