2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00720
|View full text |Cite
|
Sign up to set email alerts
|

Fast Object Detection in Compressed Video

Abstract: Object detection in videos has drawn increasing attention since it is more practical in real scenarios. Most of the deep learning methods use CNNs to process each decoded frame in a video stream individually. However, the free of charge yet valuable motion information already embedded in the video compression format is usually overlooked. In this paper, we propose a fast object detection method by taking advantage of this with a novel Motion aided Memory Network (MMNet). The MMNet has two major advantages: 1) … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
34
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 58 publications
(34 citation statements)
references
References 47 publications
0
34
0
Order By: Relevance
“…Another research stream designs dedicated networks to spectral input coefficients: harmonic networks [6] uses custom convolutions that produce high-level features by learning combinations of spectral filters defined by the 2D Discrete Cosine Transform; Ehrlich and Davis (2019) [7] introduce a ResNet able to operate on compressed JPEG images by including the compression transform into the network weights. From video side, two recent works on detection in compressed videos are [8], [9]. In [8], separate CNNs are used for temporally linked I-frame (RGB image), and P-frame (motion and residual arrays) are trained all together.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Another research stream designs dedicated networks to spectral input coefficients: harmonic networks [6] uses custom convolutions that produce high-level features by learning combinations of spectral filters defined by the 2D Discrete Cosine Transform; Ehrlich and Davis (2019) [7] introduce a ResNet able to operate on compressed JPEG images by including the compression transform into the network weights. From video side, two recent works on detection in compressed videos are [8], [9]. In [8], separate CNNs are used for temporally linked I-frame (RGB image), and P-frame (motion and residual arrays) are trained all together.…”
Section: Introductionmentioning
confidence: 99%
“…In [8], separate CNNs are used for temporally linked I-frame (RGB image), and P-frame (motion and residual arrays) are trained all together. In [9], the authors consider three networks: a CNN feature extraction module based on the raw I-image, a recurrent memory network to align the features of consecutive P-frames using compressed motion and residual vectors, and a detection network aiming at identifying the objects in the videos.…”
Section: Introductionmentioning
confidence: 99%
“…Alvar et al [17] constructed approximate bounding-box of the target object in the pixel level based on the bounding-box in previous frame for single object tracking, but each frame needs to be restored into RGB image for detection. The work in [18] fed MVs and residuals to a network to propagate the features across frames for object detection. However, the frames are processed in a batch manner which is inapplicable for online tasks.…”
Section: B Work In the Compressed Domainmentioning
confidence: 99%
“…Secondly, the motion information is readily provided in the compressed domain which is helpful for video tasks. A few works, such as tracking [8], [17], video object detection [18] and action recognition [19], have been done in the compressed domain. Generally, there are two goals of the implementations in the compressed domain: (1) Feature propagation [18].…”
Section: Introductionmentioning
confidence: 99%
“…The motion model for a three-dimensional object is usually related to the position and orientation of the object [10]. While dealing with video compression, the macroblocks are divided into keyframes, and selected motion motions are considered disruptions of these frames considering motion parameters [11]. In the case of a deformable object, the motion model generally considers the position of a target object over the mesh [12].…”
mentioning
confidence: 99%