2022
DOI: 10.3390/s22134953
|View full text |Cite
|
Sign up to set email alerts
|

NRT-YOLO: Improved YOLOv5 Based on Nested Residual Transformer for Tiny Remote Sensing Object Detection

Abstract: To address the problems of tiny objects and high resolution of object detection in remote sensing imagery, the methods with coarse-grained image cropping have been widely studied. However, these methods are always inefficient and complex due to the two-stage architecture and the huge computation for split images. For these reasons, this article employs YOLO and presents an improved architecture, NRT-YOLO. Specifically, the improvements can be summarized as: extra prediction head and related feature fusion laye… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(24 citation statements)
references
References 37 publications
0
12
0
Order By: Relevance
“…YOLOv5 also comprises different scales of models, such as YOLOV5n, YOLOV5s, YOLOv5m, YOLOv5l, and YOLOv5x, with alterations in depth and width in each model. Among these models, YOLOv5l (Large) was chosen for this study because it meets the workstation specifications while having a sufficiently deep and wide network to further improve the performance [55,56].…”
Section: Methodsmentioning
confidence: 99%
“…YOLOv5 also comprises different scales of models, such as YOLOV5n, YOLOV5s, YOLOv5m, YOLOv5l, and YOLOv5x, with alterations in depth and width in each model. Among these models, YOLOv5l (Large) was chosen for this study because it meets the workstation specifications while having a sufficiently deep and wide network to further improve the performance [55,56].…”
Section: Methodsmentioning
confidence: 99%
“…In the entire GF-6 image, a large number of small-sized tailings ponds are in general sparsely and non-uniformly distributed, and it is difficult to distinguish them from the surrounding background, which makes tailings ponds extraction challenging. The YOLOv5s model with the C3 module cannot overcome this deficiency well because it lacks the ability to obtain global and contextual information [29], but the transformer can better integrate the semantic information of the contextual and global features, and has a good recognition effect for sparse small targets with complex backgrounds [30,31]. Due to the high-cost calculation of the transformer, Swin Transformer [32] is selected to improve the backbone network of YOLOv5s.…”
Section: Swin-t Backbonementioning
confidence: 99%
“…SW-MSA realizes the information interaction between adjacent windows through a shifted window partitioning approach, and finally realizes the perception of global information. To embed the Swin Transformer block into the backbone, inspired by the work of C3NRT [29] and C3-Trans [30], we propose a new C3Swin-T module, which replaces the original Bottleneck block in C3 by the Swin Transformer block. All C3 modules of the original backbone are replaced by C3SwinT to build a new Swin Transformer backbone (Swin-T backbone), while other layers keep the same, and the structure is illustrated in Figure 7.…”
Section: Swin-t Backbonementioning
confidence: 99%
“…The advent of two-stage models such as RCNN [ 236 ] and faster RCNN [ 44 ], and one-stage methods such as YOLO series [ 46 , 419 , 420 ], made another leap in detection accuracy. By adapting two-stage models, most work focuses on improving the quality of region proposals [ 421 , 422 , 423 ].…”
Section: Deep Learning In Diverse Intelligent Sensor Based Systemsmentioning
confidence: 99%