2018 IEEE Winter Conference on Applications of Computer Vision (WACV) 2018
DOI: 10.1109/wacv.2018.00179
|View full text |Cite
|
Sign up to set email alerts
|

ActionFlowNet: Learning Motion Representation for Action Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
83
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 96 publications
(88 citation statements)
references
References 17 publications
2
83
0
Order By: Relevance
“…Table 8 shows the results. MFNet [15] captures motion by spatially shifting CNN feature maps, then summing the results, TVNet [5] applies a convolutional optical flow method to RGB inputs, and ActionFlowNet [15] 52.5 56.8 TVNet [5] 39.4 57.5 RGB-OFF [21] 55.6 56.9 Ours 61.1 65.4 [16] trains a CNN to jointly predict optical flow and activity classes. We also compare to OFF [21] using only RGB inputs.…”
Section: Flow-of-flowmentioning
confidence: 99%
“…Table 8 shows the results. MFNet [15] captures motion by spatially shifting CNN feature maps, then summing the results, TVNet [5] applies a convolutional optical flow method to RGB inputs, and ActionFlowNet [15] 52.5 56.8 TVNet [5] 39.4 57.5 RGB-OFF [21] 55.6 56.9 Ours 61.1 65.4 [16] trains a CNN to jointly predict optical flow and activity classes. We also compare to OFF [21] using only RGB inputs.…”
Section: Flow-of-flowmentioning
confidence: 99%
“…Finally, it is worth noting the self-supervised learning works on "harvesting" training data from unlabeled sources for action recognition. Fernando et al [12] and Mishra et al [28] shuffle the video frames and treat them as positive/negative training data; Sharma et al [34] mines labels using a distance matrix based on similarity although for video face clustering; Wei et al [51] divides a single clip into non-overlapping 10-frame chunks, and then predict the ordering task; Ng et al [29] estimates optical flow while recognizing actions. We compare all these methods against our unsupervised future frame prediction based ConvNet training in the experimental section.…”
Section: Background and Related Workmentioning
confidence: 99%
“…However, the major obstacle stems from the lack of high-quality training data. To mitigate the data scarcity, some train an optical flow model from synthesized datasets [18], or predict the label of videos in an end-to-end way for improving the accuracy [26,19]. In addition, the optimization ideas in the traditional methods are integrated into the design of the neural networks.…”
Section: Related Workmentioning
confidence: 99%