2019
DOI: 10.48550/arxiv.1911.09516
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Spatial Fusion for Single-Shot Object Detection

Abstract: Pyramidal feature representation is the common practice to address the challenge of scale variation in object detection. However, the inconsistency across different feature scales is a primary limitation for the single-shot detectors based on feature pyramid. In this work, we propose a novel and data driven strategy for pyramidal feature fusion, referred to as adaptively spatial feature fusion (ASFF). It learns the way to spatially filter conflictive information to suppress the inconsistency, thus improving th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
168
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 199 publications
(237 citation statements)
references
References 45 publications
0
168
0
1
Order By: Relevance
“…When training the R-CNN module, the IoU threshold for pos/neg division is changed to 0.5. IoU based label assignment is proved effective and soon been adopted by many Faster R-CNN's variants like [2,12,20,42,26,49,37], as well as many one-stage detectors like [31,32,25,27,23,21].…”
Section: Fixed Label Assignmentmentioning
confidence: 99%
“…When training the R-CNN module, the IoU threshold for pos/neg division is changed to 0.5. IoU based label assignment is proved effective and soon been adopted by many Faster R-CNN's variants like [2,12,20,42,26,49,37], as well as many one-stage detectors like [31,32,25,27,23,21].…”
Section: Fixed Label Assignmentmentioning
confidence: 99%
“…T is the number of previous features used for aggregation. Similar to [24], w is predicted by two convolution layers followed by softmax function. We find that in experiment the weighted summation is slightly better than average summation.…”
Section: Motion-guided Feature Warpermentioning
confidence: 99%
“…Since SSD [6] and FPN [7] propose to detect objects with different sizes at different scale levels, there are numerous methods to extract better aligned features for different scale levels. PaNet [8], BiFpn [9] and ASFF [10] merge scale information in a deeper and more complex manner, the performance gap between two tasks is mitigated as each scale level includes more comprehensive information from other scale levels. Recursive-Fpn [11] uses ASPP [12] structure to merge information across scale levels, also mitigating the performance gap between two tasks.…”
Section: Scale Misalignmentmentioning
confidence: 99%