Vehicle detection from multi-modal aerial imagery using YOLOv3 with mid-level fusion

Dhanaraj, Mayur; Sharma, Manish; Sarkar, Tiyasa; Karnam, Srivallabha; Chachlakis, Dimitris G.; Ptucha, Raymond; Markopoulos, Panos P.; Saber, Eli

doi:10.1117/12.2558115

Cited by 18 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is more accurate than the best mono-modality network YOLO-fine in mAP 0.5 (2.86% ↑). For multi-modality networks, the detection performance of our YOLOFusion greatly surpasses the previous network (YOLOv3 with mid-level fusion Dhanaraj et al (2020)) in a more stringent evaluation metric, mAP 0.5∶0.95 (4.52% ↑). This is a very huge improvement, and it is worth noting that our YOLOFusion(∼12.5M) is a lightweight and small-scale network with far fewer parameters than the original YOLOv3(∼61.6M), let alone the dual-stream fused YOLOv3.…”

Section: Comparison With State-of-the-art Object Detection Methodsmentioning

confidence: 79%

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Fang¹,

Wang²

2021

Preprint

View full text Add to dashboard Cite

We propose a simple yet effective CMAFF module that can fuse the complementary information of multispectral remote sensing images with joint common-modality and differential-modality attentions.• We confirm the effectiveness of our cross-modality fusion attention module through extensive ablation studies.• We design a new two-stream object detection network YOLOFusion for multispectral remote sensing images and verify its performance.

show abstract

Section: Comparison With State-of-the-art Object Detection Methodsmentioning

confidence: 79%

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Fang¹,

Wang²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In this method, the two images are extracted with features by some convolutional layers separately to obtain large scale feature maps, then the feature maps are concatenated and input to the backbone to detect the objects. MLF has both a large spatial scale and a sufficient number of training parameters for the concatenated feature maps, which often have better detection results compared to fusing at other locations 31 , 46 …”

Section: Related Workmentioning

confidence: 99%

“…MLF has both a large spatial scale and a sufficient number of training parameters for the concatenated feature maps, which often have better detection results compared to fusing at other locations. 31,46 In multimodal remote sensing objects detection, Sharma et al 25 proposed a simple and effective MLF network, YOLOrs. This network consists of a five-layer convolutional fusion structure and the YOLOv3 47 network which is commonly used in various object detection, and successfully improved the average precision by 4% compared to the optimal unimodal model on the VEDAI multimodal remote sensing dataset which contains infrared images and visible images.…”

Section: Related Workmentioning

confidence: 99%

“…In addition, multimodal images introduce more useful information as well as more background interference information. 46,[50][51][52] In the crack images of complex scenes, the complex backgrounds in Vis images and the thermal interference in IR images make this problem particularly acute. CA is a way to enhance useful information and suppress useless information by learning the importance between channels.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Low saliency crack detection based on improved multimodal object detection network: an example of wind turbine blade inner surface

Gao

Dai

et al. 2023

J. Electron. Imag.

View full text Add to dashboard Cite

.Accurate identification of cracks is of great significance for maintaining the health of the equipment. However, the low saliency of cracks in some composite or metal surfaces affects the detection accuracy of object detection algorithms. For example, small cracks on the inner surface of wind turbine blade (WTB) may be similar in color to the substrate or face complex background textures. Taking WTB cracks as low saliency crack samples, we propose a multimodal object detection convolutional neural network that fuses infrared images with visible images to detect cracks more accurately. The proposed network contains the CenterNet network with an existing fast and efficient mid-level fusion structure. First, we optimized the fusion structure to make it more suitable for extracting crack features. To address the problem that severe background interference in multimodal images affects the detection performance, we add channel attention to the fusion structure and train the improved network using a stepwise training method to enhance the framework’s ability to filter background interference information. Finally, the effectiveness of the improvements was verified by ablation experiments and feature map analysis, and the phenomena of wrong detection, missed detection, and repeated detection were reduced. The evaluation results show that the proposed multimodal object detection network is able to detect the low saliency WTB cracks more effectively, and the improvement of the network also results in a 6.22% increase in average precision. In addition, this method can be extended to other materials or scenes to identify very inconspicuous objects, replacing manual inspection in more challenging defect detection tasks.

show abstract

“…Similarly, the performance of object detection in remote sensing can further improve by leveraging multimodal aerial imagery. For example, IR modality captures longer thermal wavelengths and, thus, it can enable detection of objects in varying weather conditions, expanding on the capabilities of RGB [40].…”

Section: B Object Detection In Multimodal Datamentioning

confidence: 99%

YOLOrs: Object Detection in Multimodal Remote Sensing Imagery

Sharma

Dhanaraj

Karnam

et al. 2021

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

Self Cite

View full text Add to dashboard Cite

Deep-learning object detection methods that are designed for computer vision applications tend to under-perform when applied to remote sensing data. This is because, contrary to computer vision, in remote sensing training data are harder to collect and targets can be very small, occupying only a few pixels in the entire image, and exhibit arbitrary perspective transformations. Detection performance can improve by fusing data from multiple remote sensing modalities, including RGB, IR, hyper-spectral, multi-spectral, synthetic aperture radar, and LiDAR, to name a few. In this work, we propose YOLOrs: a new convolutional neural network, specifically designed for realtime object detection in multimodal remote sensing imagery. YOLOrs can detect objects at multiple scales, with smaller receptive fields to account for small targets, as well as predict target orientations. In addition, YOLOrs introduces a novel midlevel fusion architecture that renders it applicable to multimodal aerial imagery. Our experimental studies compare YOLOrs with contemporary alternatives and corroborate its merits.

show abstract

Vehicle detection from multi-modal aerial imagery using YOLOv3 with mid-level fusion

Cited by 18 publications

References 0 publications

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Low saliency crack detection based on improved multimodal object detection network: an example of wind turbine blade inner surface

YOLOrs: Object Detection in Multimodal Remote Sensing Imagery

Contact Info

Product

Resources

About