2021 IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
DOI: 10.1109/wacv48630.2021.00058
|View full text |Cite
|
Sign up to set email alerts
|

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 43 publications
(29 citation statements)
references
References 21 publications
0
29
0
Order By: Relevance
“…We followed the same protocol: 64/12/24 classes and 13063/2210/4472 videos for meta-training, meta-validation and meta-testing respectively. Whilst Kinetics is one of the most commonly evaluated datasets, visual appearance and background encapsulate most class-related information rather than motion patterns [21]. With less need for temporal modeling and involving coarse-grained action classes, it presents a relatively easy action classification task.…”
Section: Methodsmentioning
confidence: 99%
“…We followed the same protocol: 64/12/24 classes and 13063/2210/4472 videos for meta-training, meta-validation and meta-testing respectively. Whilst Kinetics is one of the most commonly evaluated datasets, visual appearance and background encapsulate most class-related information rather than motion patterns [21]. With less need for temporal modeling and involving coarse-grained action classes, it presents a relatively easy action classification task.…”
Section: Methodsmentioning
confidence: 99%
“…UCF101 [23] and HMDB101 [17]), our proposed TRX primarily focuses on fine-grained actions where temporal information is required. Several works [14,21,15] showcased these traditional datasets to be appearance-based with a single-frame or shuffled frames sufficient to recognise the action. SSv2, in particular, has been shown to require temporal reasoning (e.g.…”
Section: Setupmentioning
confidence: 99%
“…Another work on explainability for video models is by Price et al [38], but only one type of model, and its decisions, is studied (TRN [54]). We are connected to the work of Sevilla-Lara et al [40], who discuss the risk that models with strong image modeling abilities may prioritize those cues over the temporal modeling cues. Similar to the findings of Geirhos et al [16], Sevilla-Lara et al find that inflated convolutions tend to learn classes better where motion is less important, and that generalization can be helped by training on more temporally focused data (in analogy to training on shape-based data in [16]).…”
Section: Related Workmentioning
confidence: 99%