2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.52
|View full text |Cite
|
Sign up to set email alerts
|

Flow-Guided Feature Aggregation for Video Object Detection

Abstract: Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to exploit temporal information on box level, but such methods are not trained end-to-end. We present flowguided feature aggregation, an accurate and end-to-end learning framework for video object detection. It leverages temporal coherence on feature level instead. It improves th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

5
653
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 576 publications
(659 citation statements)
references
References 46 publications
(90 reference statements)
5
653
0
1
Order By: Relevance
“…Generalizing still image detectors to video domain is not trivial due to the spatial and temporal complex variations existed in videos, not to mention that the object appearances in some frames may be deteriorated by motion blur or occlusion. One common solution to amend this problem is feature aggregation [1,29,49,53,54,55] that enhances per-frame features by aggregating the features of nearby frames. Specifically, FGFA [54] utilizes the optical flow from FlowNet [7] to guide the pixel-level motion compensation on feature maps of adjacent frames for feature aggregation.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Generalizing still image detectors to video domain is not trivial due to the spatial and temporal complex variations existed in videos, not to mention that the object appearances in some frames may be deteriorated by motion blur or occlusion. One common solution to amend this problem is feature aggregation [1,29,49,53,54,55] that enhances per-frame features by aggregating the features of nearby frames. Specifically, FGFA [54] utilizes the optical flow from FlowNet [7] to guide the pixel-level motion compensation on feature maps of adjacent frames for feature aggregation.…”
Section: Related Workmentioning
confidence: 99%
“…D&T [8] integrates a tracking formulation into R-FCN [5] to simultaneously perform object detection and across-frame track regression. [46] further extends FGFA [54] by calibrating the object features on box level to boost video object detection.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…To address this, FGFA [29] performs optical flow guided spatial warping before aggregating features. The resulting features are subsequently fused temporally by weighted element-wise addition where weights are determined by the optical flow field.…”
Section: Video Object Detectionmentioning
confidence: 99%