2020
DOI: 10.1007/978-3-030-58548-8_25
|View full text |Cite
|
Sign up to set email alerts
|

SF-Net: Single-Frame Supervision for Temporal Action Localization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
76
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 89 publications
(91 citation statements)
references
References 39 publications
0
76
0
Order By: Relevance
“…Besides, our iterative approach takes around 4.6 minutes to train even on CPU. Our method can also be used to improve other single frame methods [10]. Compared to fully supervised methods, our method gives good performance while utilizing significantly less annotation effort.…”
Section: Methodsmentioning
confidence: 97%
See 1 more Smart Citation
“…Besides, our iterative approach takes around 4.6 minutes to train even on CPU. Our method can also be used to improve other single frame methods [10]. Compared to fully supervised methods, our method gives good performance while utilizing significantly less annotation effort.…”
Section: Methodsmentioning
confidence: 97%
“…to optimize the segment length and recognize human actions with fewer frames [8,9]. Using a single timestamp instead of start and end time for action recognition has been shown to be a reasonable compromise between performance and annotation effort [10]. In this paper, we question the need for more complex methods, and evaluate an extremely simple idea: We propose labeling a single action frame as "key frame" inside an action's temporal window (Figure 1) and evaluate the simplest approach we could find: Positive Unlabeled (PU) learning to detect action frames.…”
Section: Introductionmentioning
confidence: 99%
“…TSS require one frame label for each action 1 While the percentage of overall labeled frames is very little (0.03%), the annotation effort should not be underestimated. Annotators must still watch all the videos and labelling timestamp frames gives only a 6X speedup compared to densely labelling all frames (Ma et al 2020).…”
Section: Introductionmentioning
confidence: 99%
“…In practice, this type of weak label is akin to the time-stamp annotations used in weakly-supervised temporal action segmentation, in which an arbitrary frame from each action segment is labelled [8,9,10]. When annotating timestamps, Figure 1: Dense anticipation with full supervision vs. weak supervision.…”
Section: Introductionmentioning
confidence: 99%
“…annotators quickly go through a video and press a button when an action is occurring. This is ∼6x faster than marking the exact start and end frames of action segments [10] and still provides strong cues to learn effective models for action segmentation.…”
Section: Introductionmentioning
confidence: 99%