2021
DOI: 10.48550/arxiv.2110.07058
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,025 hours of dailylife activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(19 citation statements)
references
References 163 publications
0
19
0
Order By: Relevance
“…The increasing accuracy of monocular and multi-view automated methods for face, pose, and hands estimation has contributed in reducing the annotation effort. Still, the largest available datasets that provide thousands of hours of audiovisual material and feature the widest spectrum of behaviors do not provide such annotations (Carreira et al, 2019;Zhao et al, 2019;Monfort et al, 2020;Grauman et al, 2021). In contrast, the automated methods for high-level representations recognition such as feedback responses or atomic action labels are not accurate enough to significantly help in their annotation procedures.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The increasing accuracy of monocular and multi-view automated methods for face, pose, and hands estimation has contributed in reducing the annotation effort. Still, the largest available datasets that provide thousands of hours of audiovisual material and feature the widest spectrum of behaviors do not provide such annotations (Carreira et al, 2019;Zhao et al, 2019;Monfort et al, 2020;Grauman et al, 2021). In contrast, the automated methods for high-level representations recognition such as feedback responses or atomic action labels are not accurate enough to significantly help in their annotation procedures.…”
Section: Discussionmentioning
confidence: 99%
“…Thanks to the camera portability during the collection, egocentric datasets can record social behavior in less constrained environments. Very recently, Grauman et al (2021) released more than 3000 hours of in-the-wild egocentric recordings of human actions, which also include social interactions. Finally, the computer-mediated recording setup elicits a very particular behavior due to the idiosyncrasies of the communication channel (McKeown et al, 2010;Ringeval et al, 2013;Cafaro et al, 2017;Feng et al, 2017;Kossaifi et al, 2019).…”
Section: Datasetsmentioning
confidence: 99%
“…HPS [22] reconstructs the body pose and shape of a subject wearing a head-mounted camera moving in large 3D scene, but with few social interactions. Recently, Ego4D [19] collects a massive amount of egocentric videos for various tasks including action and social behavior understanding, making significant advances in stimulating future research in the egocentric domain. Our dataset is complementary to Ego4D in that we provide 3D human pose and shape ground-truth for the camera wearer and their interaction partner.…”
Section: Related Workmentioning
confidence: 99%
“…The community's interest has quickly grown [16,17,19,83] in recent years, thanks to the possibilities that these data open for the evaluation and understanding of human behavior, leading to the design of novel architectures [30,51,52,91,104]. While the use of optical flow has been the de-facto procedure [14][15][16][17]19,41] in FPAR, the interest has recently shifted towards more lightweight alternatives, such as gaze [27,59,71], audio [9,52,77], depth [32], skeleton [32], and inertial measurements [41], to enable motion modeling in online settings. These, when combined with traditional modalities, produce encouraging results, but not enough to make them viable alternatives.…”
Section: Related Workmentioning
confidence: 99%
“…With the advent of novel large-scale datasets [14,15], new tasks are being proposed, such as wearer's pose estimation [105] and egocentric videos anonymization [95]. This trend will grow in the next years thanks to the very recent release of Ego4D [41], a massive-scale egocentric Figure 1. N-EPIC-Kitchens: the first event-based dataset for egocentric action recognition.…”
Section: Introductionmentioning
confidence: 99%