2020
DOI: 10.48550/arxiv.2012.00317
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification

Abstract: Video classification researches that have recently attracted attention are the fields of temporal modeling and 3D efficient architecture. However, the temporal modeling methods are not efficient or the 3D efficient architecture is less interested in temporal modeling. For bridging the gap between them, we propose an efficient temporal modeling 3D architecture, called VoV3D, that consists of a temporal one-shot aggregation (T-OSA) module and depthwise factorized component, D(2+1)D. The T-OSA is devised to build… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 30 publications
(142 reference statements)
0
2
0
Order By: Relevance
“…Two metrics were added during the training: accuracy and top 5 categorical accuracy. The sessions used different number of epochs (50,100,200) and, in some cases, early-stopping. The best results were achieved with 200 epochs, but in all the different trainings, with the same or different number of epochs (50, and validation sets, our model considerably improved on the values of LRCN.…”
Section: Experiments and Results Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Two metrics were added during the training: accuracy and top 5 categorical accuracy. The sessions used different number of epochs (50,100,200) and, in some cases, early-stopping. The best results were achieved with 200 epochs, but in all the different trainings, with the same or different number of epochs (50, and validation sets, our model considerably improved on the values of LRCN.…”
Section: Experiments and Results Discussionmentioning
confidence: 99%
“…The Something-Something dataset [46] is a dataset composed of about 108,000 videos organized into 174 classes. It is a current but widely used dataset [47]- [50]. However, it focuses on how a person manipulates objects with their hands.…”
Section: Overview Of Related Workmentioning
confidence: 99%