2023
DOI: 10.3390/rs15061687
|View full text |Cite
|
Sign up to set email alerts
|

TPH-YOLOv5++: Boosting Object Detection on Drone-Captured Scenarios with Cross-Layer Asymmetric Transformer

Abstract: Object detection in drone-captured images is a popular task in recent years. As drones always navigate at different altitudes, the object scale varies considerably, which burdens the optimization of models. Moreover, high-speed and low-altitude flight cause motion blur on densely packed objects, which leads to great challenges. To solve the two issues mentioned above, based on YOLOv5, we add an additional prediction head to detect tiny-scale objects and replace CNN-based prediction heads with transformer predi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(22 citation statements)
references
References 52 publications
0
6
0
Order By: Relevance
“…)) (10) where c h is the height difference between the center point of the ground truth box and the prediction box, σ is the distance between the center point of the ground truth box and the prediction box, and, in fact, arcsin ( c h σ ) is equal to the angle α, defined as shown in Equations ( 11)- (13). where (𝑏 , 𝑏 ) is the ground truth box center coordinates, and (𝑏 , 𝑏 ) is the predicted box center coordinates.…”
Section: Loss Functionmentioning
confidence: 99%
See 1 more Smart Citation
“…)) (10) where c h is the height difference between the center point of the ground truth box and the prediction box, σ is the distance between the center point of the ground truth box and the prediction box, and, in fact, arcsin ( c h σ ) is equal to the angle α, defined as shown in Equations ( 11)- (13). where (𝑏 , 𝑏 ) is the ground truth box center coordinates, and (𝑏 , 𝑏 ) is the predicted box center coordinates.…”
Section: Loss Functionmentioning
confidence: 99%
“…The two-stage detection algorithms are represented by R-CNN and Fast R-CNN, necessitating the generation of region proposals and feature extraction by CNN, followed by object classification and localization using classifiers [11,12]. Zhao et al [13] enhanced the capability of YOLOv5 to identify remotely sensed images by integrating an extra cross-layer asymmetric transformer (CA-Trans) prediction head. The addition captures the asymmetric information between the head and others effectively, thanks to the use of a sparse local attention (SLA) module.…”
Section: Introductionmentioning
confidence: 99%
“…The accuracy of two-stage algorithm performs well, but there are some shortcomings in the detection speed, and it is difficult to achieve real-time computing on the platform with weak computing power. In respect of one-stage algorithm, YOLO Series [13,[17][18][19][20][21][22] can realize the regression of candidate boxes and classification of categories. The use of the attention mechanism in deep learning has improved the detection results.…”
Section: Of 16mentioning
confidence: 99%
“…Another common approach for improving the detection head is to add an additional head specifically designed for detecting small objects to the existing three detection heads in YOLOv5. Researchers such as Baidya R [ 33 ] and Zhao Q [ 34 ] have adopted this approach and have extensively demonstrated its effectiveness in improving the network’s ability to detect small objects through numerous experiments.…”
Section: Related Workmentioning
confidence: 99%