2018
DOI: 10.1007/978-3-030-01249-6_24
|View full text |Cite
|
Sign up to set email alerts
|

Motion Feature Network: Fixed Motion Filter for Action Recognition

Abstract: Spatio-temporal representations in frame sequences play an important role in the task of action recognition. Previously, a method of using optical flow as a temporal information in combination with a set of RGB images that contain spatial information has shown great performance enhancement in the action recognition tasks. However, it has an expensive computational cost and requires two-stream (RGB and optical flow) framework. In this paper, we propose MFNet (Motion Feature Network) containing motion blocks whi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
90
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 130 publications
(92 citation statements)
references
References 37 publications
2
90
0
Order By: Relevance
“…For example, TLE [7], ShuttleNet [35], AttentionClusters [29] and NetVlad [1,16] are proposed for better local feature integration instead of directly AVG-Pooling as used in TSN. OFF [37] and motion feature network [26] are proposed for integrating motion information modeling into a spatial CNN network, instead of using two streams. I3D [3] inflates deeper networks than C3D for spatial-temporal modeling.…”
Section: Action Recognitionmentioning
confidence: 99%
“…For example, TLE [7], ShuttleNet [35], AttentionClusters [29] and NetVlad [1,16] are proposed for better local feature integration instead of directly AVG-Pooling as used in TSN. OFF [37] and motion feature network [26] are proposed for integrating motion information modeling into a spatial CNN network, instead of using two streams. I3D [3] inflates deeper networks than C3D for spatial-temporal modeling.…”
Section: Action Recognitionmentioning
confidence: 99%
“…Table 8 shows the results. MFNet [15] captures motion by spatially shifting CNN feature maps, then summing the results, TVNet [5] applies a convolutional optical flow method to RGB inputs, and ActionFlowNet [15] 52.5 56.8 TVNet [5] 39.4 57.5 RGB-OFF [21] 55.6 56.9 Ours 61.1 65.4 [16] trains a CNN to jointly predict optical flow and activity classes. We also compare to OFF [21] using only RGB inputs.…”
Section: Flow-of-flowmentioning
confidence: 99%
“…The upper part presents the 3D CNN based methods, including S3D-G [37], ECO [42] and I3D+GCN models [35]. The lower part is 2D CNN based methods, including TRN [40], MFNet [18] and TSM [19]. It is clear that even STM with 8 RGB frames Table 4.…”
Section: Results On Temporal-related Datasetsmentioning
confidence: 99%
“…However, calculating optical flow with TV-L1 method [38] is expensive in both time and space. Recently many approaches have been proposed to estimate optical flow with CNN [5,14,6,21] or explored alternatives of optical flow [33,39,26,18]. TSN frameworks [33] involved RGB difference between two frames to represent motion in videos.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation