2017
DOI: 10.48550/arxiv.1705.08421
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
54
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(56 citation statements)
references
References 0 publications
1
54
0
1
Order By: Relevance
“…Most of the existing action recognition datasets contain high resolution, actor centric videos [25], [12], [13], [19], [2], [18], [8], [11], [17], [6]. For example, Kinetics [13], Charades [19], Youtube-8M [2] are collected from Youtube videos where actions cover most of the image regions in every frame of a video.…”
Section: Tinyvirat Datasetmentioning
confidence: 99%
See 1 more Smart Citation
“…Most of the existing action recognition datasets contain high resolution, actor centric videos [25], [12], [13], [19], [2], [18], [8], [11], [17], [6]. For example, Kinetics [13], Charades [19], Youtube-8M [2] are collected from Youtube videos where actions cover most of the image regions in every frame of a video.…”
Section: Tinyvirat Datasetmentioning
confidence: 99%
“…The availability of large-scale datasets and the progress of neural networks have provided significant improvement to video action recognition task. Datasets with multiple actors and actions such as UCF-101 [21], Kinetics [20,13], AVA [8], YouTube-8M [1] and Moments-in-time [15] provide a large set of data with higher versatility for training neural networks. This has enabled several state-of-the-art architectures such as C3D [22], I3D [3], ResNet-3D [9] and R2+1D [23] which have been effective at recognizing the correct actions.…”
Section: Introductionmentioning
confidence: 99%
“…Later, 3D Con-vNets [3,30,36] are shown to perform better in spatiotemporal modeling. With many large-scale video datasets [11,5,2,10] coming out, 3D ConvNets are able to get high accuracies when incorporated into a two-stream framework. However, 3D ConvNets are computationally heavy and there are some efforts like [32,31], trying to alleviate 3D computations and get comparable or even better performances.…”
Section: Related Workmentioning
confidence: 99%
“…Action recognition has become a more and more important topic in the field of academic research as well as in industrial context. This is shown by the amount of publications and the diversity of research directions, as well as by the growing number of challenging datasets in this field [10,5,33,11]. So far, most of these approaches rely on fully supervised training.…”
Section: Introductionmentioning
confidence: 99%