Big Data II: Learning, Analytics, and Applications 2020
DOI: 10.1117/12.2558115
|View full text |Cite
|
Sign up to set email alerts
|

Vehicle detection from multi-modal aerial imagery using YOLOv3 with mid-level fusion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…It is more accurate than the best mono-modality network YOLO-fine in mAP 0.5 (2.86% ↑). For multi-modality networks, the detection performance of our YOLOFusion greatly surpasses the previous network (YOLOv3 with mid-level fusion Dhanaraj et al (2020)) in a more stringent evaluation metric, mAP 0.5∶0.95 (4.52% ↑). This is a very huge improvement, and it is worth noting that our YOLOFusion(∼12.5M) is a lightweight and small-scale network with far fewer parameters than the original YOLOv3(∼61.6M), let alone the dual-stream fused YOLOv3.…”
Section: Comparison With State-of-the-art Object Detection Methodsmentioning
confidence: 79%
“…It is more accurate than the best mono-modality network YOLO-fine in mAP 0.5 (2.86% ↑). For multi-modality networks, the detection performance of our YOLOFusion greatly surpasses the previous network (YOLOv3 with mid-level fusion Dhanaraj et al (2020)) in a more stringent evaluation metric, mAP 0.5∶0.95 (4.52% ↑). This is a very huge improvement, and it is worth noting that our YOLOFusion(∼12.5M) is a lightweight and small-scale network with far fewer parameters than the original YOLOv3(∼61.6M), let alone the dual-stream fused YOLOv3.…”
Section: Comparison With State-of-the-art Object Detection Methodsmentioning
confidence: 79%
“…In this method, the two images are extracted with features by some convolutional layers separately to obtain large scale feature maps, then the feature maps are concatenated and input to the backbone to detect the objects. MLF has both a large spatial scale and a sufficient number of training parameters for the concatenated feature maps, which often have better detection results compared to fusing at other locations 31 , 46 …”
Section: Related Workmentioning
confidence: 99%
“…MLF has both a large spatial scale and a sufficient number of training parameters for the concatenated feature maps, which often have better detection results compared to fusing at other locations. 31,46 In multimodal remote sensing objects detection, Sharma et al 25 proposed a simple and effective MLF network, YOLOrs. This network consists of a five-layer convolutional fusion structure and the YOLOv3 47 network which is commonly used in various object detection, and successfully improved the average precision by 4% compared to the optimal unimodal model on the VEDAI multimodal remote sensing dataset which contains infrared images and visible images.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Similarly, the performance of object detection in remote sensing can further improve by leveraging multimodal aerial imagery. For example, IR modality captures longer thermal wavelengths and, thus, it can enable detection of objects in varying weather conditions, expanding on the capabilities of RGB [40].…”
Section: B Object Detection In Multimodal Datamentioning
confidence: 99%