Procedings of the British Machine Vision Conference 2017 2017
DOI: 10.5244/c.31.93
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos

Abstract: In this work, we present a new intuitive, end-to-end approach for temporal action detection in untrimmed videos. We introduce our new architecture for Single-Stream Temporal Action Detection (SS-TAD), which effectively integrates joint action detection with its semantic sub-tasks in a single unifying end-to-end framework. We develop a method for training our deep recurrent architecture based on enforcing semantic constraints on intermediate modules that are gradually relaxed as learning progresses. We find tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
142
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 191 publications
(146 citation statements)
references
References 20 publications
(43 reference statements)
0
142
0
Order By: Relevance
“…(3) methods developing end-to-end architectures integrating the proposal generation and classification [48,1,26]. Our work is built upon the second category where the action proposals are first generated and then used to perform classification and boundary regression.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…(3) methods developing end-to-end architectures integrating the proposal generation and classification [48,1,26]. Our work is built upon the second category where the action proposals are first generated and then used to perform classification and boundary regression.…”
Section: Related Workmentioning
confidence: 99%
“…For simplicity, we denote the best tIoU and best overlap as tIoU and OL. Then, three types of training samples can be described as: (1) (3) Background sample: tIoU ≤ θ 4 . These certain thresholds are slightly different on two datasets as shown in Table A.…”
Section: Training Detailsmentioning
confidence: 99%
“…There are also end-to-end frameworks that enable joint optimization of proposal generation and action classification. Buch et al [2] introduce semantics constraints for curriculum training in end-to-end temporal action localization. Chao et al [8] adopt Faster R-CNN [30] for action localization task.…”
Section: Related Workmentioning
confidence: 99%
“…In most works, the temporal action proposal generation is a sub-task of the overall approach. However, there also exist end-to-end approaches [2].…”
Section: Related Workmentioning
confidence: 99%