2022
DOI: 10.1109/tmm.2021.3050067
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Informative Video Segments for Temporal Action Localization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(6 citation statements)
references
References 49 publications
0
6
0
Order By: Relevance
“…• Accurate boundary • Lacking temporal modeling [51], [41], [100], [50], [43], [101], [102], frames [88], [46], [103], [92], [18], [104], [103] • Tackling long instances • Separated procedures Classifying [1], [105], [13], [106], [98], [42],…”
Section: Classification Mechanismmentioning
confidence: 99%
See 1 more Smart Citation
“…• Accurate boundary • Lacking temporal modeling [51], [41], [100], [50], [43], [101], [102], frames [88], [46], [103], [92], [18], [104], [103] • Tackling long instances • Separated procedures Classifying [1], [105], [13], [106], [98], [42],…”
Section: Classification Mechanismmentioning
confidence: 99%
“…• Temporal modeling • Insufficient detail Field [111], [90], [46], [104], [112], [88], [113], [18], global relationship [102], [94], [95], [47], [18], [49], [91], [96] • Intra-video diversity representation Inter-video [114], [115] • Representative • Complicated relationship category features training End-to-End [1], [34], [105], [36], [114], [116],…”
Section: Classification Mechanismmentioning
confidence: 99%
“…j∈Fi,i̸ =j e cos(fi,fj )/τ j∈F,i̸ =j e cos(fi,fj )/τ (10) where cos denotes the cosine similarity function, F denotes the number of sampled foregrounds and background, f i denotes the ith of F , F i denotes the number of sampled foregrounds/backgrounds, which is similar to f i , and τ denotes the temperature parameter. This loss function will minimise the feature gap of the same categories (foreground/background), maximize the feature gap between foreground and background, and force the attention map to distinguish the foreground/background with the largest difference in the original input features.…”
Section: E Imagewise Contrastive Modulementioning
confidence: 99%
“…Recently, Actionformer [6] based on Transformer achieved the best TAL performance in [5], [6]. However, most TAL methods refine discriminative action boundaries from segment-level semantics [7]- [10],…”
Section: Introductionmentioning
confidence: 99%
“…As a single-modal task, temporal action localization aims to classify action instances by predicting the corresponding start timestamps, end timestamps, and action category labels [8], [56], [57]. Existing methods can be divided into one-stage methods [58]- [60] and two-stage methods [61]- [64].…”
Section: B Temporal Action Localizationmentioning
confidence: 99%