Fast Object Detection in Compressed Video

Wang, Shiyao; Lu, Hongchao; Deng, Zhidong

doi:10.1109/iccv.2019.00720

Cited by 58 publications

(34 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another research stream designs dedicated networks to spectral input coefficients: harmonic networks [6] uses custom convolutions that produce high-level features by learning combinations of spectral filters defined by the 2D Discrete Cosine Transform; Ehrlich and Davis (2019) [7] introduce a ResNet able to operate on compressed JPEG images by including the compression transform into the network weights. From video side, two recent works on detection in compressed videos are [8], [9]. In [8], separate CNNs are used for temporally linked I-frame (RGB image), and P-frame (motion and residual arrays) are trained all together.…”

Section: Introductionmentioning

confidence: 99%

“…In [8], separate CNNs are used for temporally linked I-frame (RGB image), and P-frame (motion and residual arrays) are trained all together. In [9], the authors consider three networks: a CNN feature extraction module based on the raw I-image, a recurrent memory network to align the features of consecutive P-frames using compressed motion and residual vectors, and a detection network aiming at identifying the objects in the videos.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Fast object detection in compressed JPEG Images

Deguerre

Chatelain

Gasso

2019

2019 IEEE Intelligent Transportation Systems Conference (ITSC)

View full text Add to dashboard Cite

Object detection in still images has drawn a lot of attention over past few years, and with the advent of Deep Learning impressive performances have been achieved with numerous industrial applications. Most of these deep learning models rely on RGB images to localize and identify objects in the image. However in some application scenarii, images are compressed either for storage savings or fast transmission. Therefore a time consuming image decompression step is compulsory in order to apply the aforementioned deep models. To alleviate this drawback, we propose a fast deep architecture for object detection in JPEG images, one of the most widespread compression format. We train a neural network to detect objects based on the blockwise DCT (discrete cosine transform) coefficients issued from the JPEG compression algorithm. We modify the well-known Single Shot multibox Detector (SSD) by replacing its first layers with one convolutional layer dedicated to process the DCT inputs. Experimental evaluations on PASCAL VOC and industrial dataset comprising images of road traffic surveillance show that the model is about 2× faster than regular SSD with promising detection performances. To the best of our knowledge, this paper is the first to address detection in compressed JPEG images.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Fast object detection in compressed JPEG Images

Deguerre

Chatelain

Gasso

2019

2019 IEEE Intelligent Transportation Systems Conference (ITSC)

View full text Add to dashboard Cite

show abstract

“…Alvar et al [17] constructed approximate bounding-box of the target object in the pixel level based on the bounding-box in previous frame for single object tracking, but each frame needs to be restored into RGB image for detection. The work in [18] fed MVs and residuals to a network to propagate the features across frames for object detection. However, the frames are processed in a batch manner which is inapplicable for online tasks.…”

Section: B Work In the Compressed Domainmentioning

confidence: 99%

“…Secondly, the motion information is readily provided in the compressed domain which is helpful for video tasks. A few works, such as tracking [8], [17], video object detection [18] and action recognition [19], have been done in the compressed domain. Generally, there are two goals of the implementations in the compressed domain: (1) Feature propagation [18].…”

Section: Introductionmentioning

confidence: 99%

Real-Time Online Multi-Object Tracking in Compressed Domain

Liu

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Recent online multi-object tracking (MOT) methods have achieved desirable tracking performance. However, the tracking speed of most existing methods is rather slow. Inspired from the fact that the adjacent frames are highly relevant and redundant, we divide the frames into key and non-key frames and track objects in the compressed domain. For the key frames, the RGB images are restored for detection and data association. To make data association more reliable, an appearance convolutional neural network (CNN) which can be jointly trained with the detector is proposed. For the non-key frames, the objects are directly propagated by a tracking CNN based on the motion information provided in the compressed domain. Compared with the state-of-the-art online MOT methods, our tracker is about 6× faster while maintaining a comparable tracking performance.INDEX TERMS Compressed domain, multi-object tracking, online, real-time.

show abstract

“…The motion model for a three-dimensional object is usually related to the position and orientation of the object [10]. While dealing with video compression, the macroblocks are divided into keyframes, and selected motion motions are considered disruptions of these frames considering motion parameters [11]. In the case of a deformable object, the motion model generally considers the position of a target object over the mesh [12].…”

mentioning

confidence: 99%