2021
DOI: 10.48550/arxiv.2104.07905
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos

Abstract: We introduce an approach for pre-training egocentric video models using large-scale third-person video datasets. Learning from purely egocentric data is limited by low dataset scale and diversity, while using purely exocentric (third-person) data introduces a large domain mismatch. Our idea is to discover latent signals in third-person video that are predictive of key egocentric-specific properties. Incorporating these signals as knowledge distillation losses during pre-training results in models that benefit … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 70 publications
0
1
0
Order By: Relevance
“…4) Transferability: Although third-person videos are more accessible, in cases such as robot manipulation where the agent observes the environment and objects from the firstperson perspective, it is necessary to transfer the knowledge from third-person videos to first-person scenarios. However, transferring affordance grounding knowledge between different perspectives [78] is still under-explored and of practical meaning.…”
Section: • Potential Applicationsmentioning
confidence: 99%
“…4) Transferability: Although third-person videos are more accessible, in cases such as robot manipulation where the agent observes the environment and objects from the firstperson perspective, it is necessary to transfer the knowledge from third-person videos to first-person scenarios. However, transferring affordance grounding knowledge between different perspectives [78] is still under-explored and of practical meaning.…”
Section: • Potential Applicationsmentioning
confidence: 99%