2022
DOI: 10.3390/s22218396
|View full text |Cite
|
Sign up to set email alerts
|

Non-Local Temporal Difference Network for Temporal Action Detection

Abstract: As an important part of video understanding, temporal action detection (TAD) has wide application scenarios. It aims to simultaneously predict the boundary position and class label of every action instance in an untrimmed video. Most of the existing temporal action detection methods adopt a stacked convolutional block strategy to model long temporal structures. However, most of the information between adjacent frames is redundant, and distant information is weakened after multiple convolution operations. In ad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 49 publications
0
0
0
Order By: Relevance
“…Action detection aims at localizing the temporal boundaries of human activities in untrimmed videos and classifying the action categories. Most existing work has used CNN-based models [4,[26][27][28] to extract spatio-temporal features from the input video frames before feeding the features into the TAD (Temporal Action Detection) network. A common practice is first to generate temporal proposals and then classify each proposal to one of the action categories [3,[29][30][31].…”
Section: Action Detection With Cnnmentioning
confidence: 99%
“…Action detection aims at localizing the temporal boundaries of human activities in untrimmed videos and classifying the action categories. Most existing work has used CNN-based models [4,[26][27][28] to extract spatio-temporal features from the input video frames before feeding the features into the TAD (Temporal Action Detection) network. A common practice is first to generate temporal proposals and then classify each proposal to one of the action categories [3,[29][30][31].…”
Section: Action Detection With Cnnmentioning
confidence: 99%