2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00630
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Learning of Motion Representation for Video Understanding

Abstract: Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a novel end-to-end trainable neural network, to learn optical-flow-like features from data. TVNet subsumes a specific optical flow solver, the TV-L1 method, and is initialized by unfolding its optimization iterations as neural layers. TVNet can therefore be used directly without any extra learning. Moreover, it can be naturally c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
129
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 211 publications
(140 citation statements)
references
References 46 publications
3
129
0
Order By: Relevance
“…Understanding human actions in videos has been becoming a prominent research topic in computer vision, owing to its various applications in security surveillance, human behavior analysis and many other areas [10,35,38,12,13,14,15,16,42]. Despite the fruitful progress in this vein, there are still some challenging tasks demanding further exploration -temporal action localization is such an example.…”
Section: Introductionmentioning
confidence: 99%
“…Understanding human actions in videos has been becoming a prominent research topic in computer vision, owing to its various applications in security surveillance, human behavior analysis and many other areas [10,35,38,12,13,14,15,16,42]. Despite the fruitful progress in this vein, there are still some challenging tasks demanding further exploration -temporal action localization is such an example.…”
Section: Introductionmentioning
confidence: 99%
“…In this subsection, we would like to see whether the performance can be further improved with the motion information added. We extract the optical flow using the initialized TVNet [7] without finetuning, and calculate the optical flow statistics as described in [22], then concatenate the statistics to the content-aware features. The performance comparison of our model with/without motion information on KoNViD-1k is shown in Figure 7.…”
Section: Motion Informationmentioning
confidence: 99%
“…However, calculating optical flow with TV-L1 method [38] is expensive in both time and space. Recently many approaches have been proposed to estimate optical flow with CNN [5,14,6,21] or explored alternatives of optical flow [33,39,26,18]. TSN frameworks [33] involved RGB difference between two frames to represent motion in videos.…”
Section: Related Workmentioning
confidence: 99%