2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00631
|View full text |Cite
|
Sign up to set email alerts
|

Compressed Video Action Recognition

Abstract: Training robust deep video representations has proven to be much more challenging than learning deep image representations. This is in part due to the enormous size of raw video streams and the high temporal redundancy; the true and interesting signal is often drowned in too much irrelevant data. Motivated by that the superfluous information can be reduced by up to two orders of magnitude by video compression (using H.264, HEVC, etc.), we propose to train a deep network directly on the compressed video.This re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
173
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 271 publications
(173 citation statements)
references
References 41 publications
0
173
0
Order By: Relevance
“…Motion Vector (MV) was a coarse representation of motion, but it can be obtained directly from compressed video streams without extra calculation. Therefore, Enhanced Motion Vectors CNN (EMV-CNN) [32] used motion vector as the input of temporal CNN to improve inference speed and CoViAR [29] adopted an accumulated motion vector for real-time action recognition. Suffered from the lack of fine detailed motion information in MV, recognition performance was degraded dramatically.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Motion Vector (MV) was a coarse representation of motion, but it can be obtained directly from compressed video streams without extra calculation. Therefore, Enhanced Motion Vectors CNN (EMV-CNN) [32] used motion vector as the input of temporal CNN to improve inference speed and CoViAR [29] adopted an accumulated motion vector for real-time action recognition. Suffered from the lack of fine detailed motion information in MV, recognition performance was degraded dramatically.…”
Section: Related Workmentioning
confidence: 99%
“…Experiment results prove that our network is highly tolerant to the quality of motion input thanks to the combination of short-term spatiotemporal feature fusion, sequentially middle-term temporal modeling and long-term temporal consensus. EMV-CNN and CoViAR [32,29] also used motion vectors but the simple replacement without consideration of more effective spatiotemporal representation results in a significant performance degradation than opticalflow-based Two-Stream CNN.…”
Section: Exploration Studymentioning
confidence: 99%
See 1 more Smart Citation
“…Hallucination Since the computation of optical flow is time consuming and storage demanding, some attempts to learn other way to replace the flow to represent motion information. (Wu, C.Y.et al, 2017) [12]proposed that compressed video algorithms can decrease the redundant information, so that can be used accumulated motion vector and residuals to describe motion. Compared to traditional flow methods, motion vectors bring more than 20 times acceleration although a significant drop in accuracy.…”
Section: Related Workmentioning
confidence: 99%
“…They jointly trained a compression network with an inference network and bring performance gain. On the video side, Wu et al [25] designed a compressed video action recognition system by using separate networks for I-frames and P-frames. Their approach is more efficient than the conventional 3D convolution structures.…”
Section: Related Workmentioning
confidence: 99%