PAFNet: An Efficient Anchor-Free Object Detector Guidance

Xin, Ying; Wang, Guanzhong; Mao, Mingyuan; Feng, Yuan; Dang, Qingqing; Ma, Yanjun; Ding, Errui; Han, Shumin

doi:10.48550/arxiv.2104.13534

Cited by 5 publications

(4 citation statements)

References 35 publications

(57 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It uses Gaussian kernels in both object localization and size regression, which allows the network to encode more training samples and accelerate the training process. PAFNet [25] extended TTFNet by using a better pre-trained model and combining several existing tricks, such as exponential moving average [26] and CutMix [27].…”

Section: Object Detection Based On Deep Learningmentioning

confidence: 99%

Training a Disaster Victim Detection Network for UAV Search and Rescue Using Harmonious Composite Images

et al. 2022

View full text Add to dashboard Cite

Human detection in images using deep learning has been a popular research topic in recent years and has achieved remarkable performance. Training a human detection network is useful for first responders to search for trapped victims in debris after a disaster. In this paper, we focus on the detection of such victims using deep learning, and we find that state-of-the-art detection models pre-trained on the well-known COCO dataset fail to detect victims. This is because all the people in the training set are shown in photos of daily life or sports activities, while people in the debris after a disaster usually only have parts of their bodies exposed. In addition, because of the dust, the colors of their clothes or body parts are similar to those of the surrounding debris. Compared with collecting images of common objects, images of disaster victims are extremely difficult to obtain for training. Therefore, we propose a framework to generate harmonious composite images for training. We first paste body parts onto a debris background to generate composite victim images and then use a deep harmonization network to make the composite images look more harmonious. We select YOLOv5l as the most suitable model, and experiments show that using composite images for training improves the AP (average precision) by 19.4% (15.3%→34.7%). Furthermore, using the harmonious images is of great benefit to training a better victim detector, and the AP is further improved by 10.2% (34.7%→44.9%). This research is part of the EU project INGENIOUS. Our composite images and code are publicly available on our website.

show abstract

Section: Object Detection Based On Deep Learningmentioning

confidence: 99%

Training a Disaster Victim Detection Network for UAV Search and Rescue Using Harmonious Composite Images

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Wang [10] introduced a pyramid structure into the transformer framework, using a progressive shrinking strategy to control the scale of feature maps. While these models demonstrate outstanding detection accuracy, they heavily rely on powerful GPUs to achieve rapid detection speed [11]. This poses a significant challenge in achieving a balance between accuracy and inference speed on mobile devices with limited computational resources [12][13][14].…”

Section: Introductionmentioning

confidence: 99%

A Novel Lightweight Object Detection Network with Attention Modules and Hierarchical Feature Pyramid

Yang,

Chen,

Wang

et al. 2023

Symmetry

View full text Add to dashboard Cite

Object detection methods based on deep learning typically require devices with ample computing capabilities, which limits their deployment in restricted environments such as those with embedded devices. To address this challenge, we propose Mini-YOLOv4, a lightweight real-time object detection network that achieves an excellent trade-off between speed and accuracy. Based on CSPDarknet-Tiny as the backbone network, we enhance the detection performance of the network in three ways. We use a multibranch structure embedded in an attention module for simultaneous spatial and channel attention calibration. We design a group self-attention block with a symmetric structure consisting of a pair of complementary self-attention modules to mine contextual information, thereby ensuring that the detection accuracy is improved without increasing the computational cost. Finally, we introduce a hierarchical feature pyramid network to fully exploit multiscale feature maps and promote the extraction of fine-grained features. The experimental results demonstrate that Mini-YOLOv4 requires only 4.7 M parameters and has a billion floating point operations (BFLOPs) value of 3.1. Compared with YOLOv4-Tiny, our approach achieves a 3.2% improvement in mean accuracy precision (mAP) for the PASCAL VOC dataset and obtains a significant improvement of 3.5% in overall detection accuracy for the MS COCO dataset. In testing with an embedded platform, Mini-YOLOv4 achieves a real-time detection speed of 25.6 FPS on the NVIDIA Jetson Nano, thus meeting the demand for real-time detection in computationally limited devices.

show abstract

“…Many prior works have performed this in two stages [17,24,25,34,37]: the localization stage finds bounding boxes of potential objects, which are then fed to a classification stage that classifies each object within a finite set of categories. In Faster R-CNN [34], for example, the classification stage is conditioned on the boxes generated from a region proposal network (RPN) and have achieved great success on several datasets; however, despite their success, these networks have shown limitations in detecting objects with large sizing variability and require exhaustive tuning for cross-dataset generalization [4,39,46]. Furthermore, by conditioning the classification stage on localization, the proposals generated by the localization stage tend to overfit the training class categories, so they tend not to be accurate on objects unavailable in the training set [11,14].…”

Section: Introductionmentioning

confidence: 99%

Extending One-Stage Detection with Open-World Proposals

Konan¹,

Liang²,

Li³

2022

Preprint

View full text Add to dashboard Cite

In many applications, such as autonomous driving, hand manipulation, or robot navigation, object detection methods must be able to detect objects unseen in the training set. Open World Detection (OWD) seeks to tackle this problem by generalizing detection performance to seen and unseen class categories. Recent works have seen success in the generation of class-agnostic proposals, which we call Open-World Proposals (OWP), but this comes at the cost of a big drop on the classification task when both tasks are considered in the detection model. These works have investigated two-stage Region Proposal Networks (RPN) by taking advantage of objectness scoring cues; however, for its simplicity, run-time, and decoupling of localization and classification, we investigate OWP through the lens of fully convolutional one-stage detection network, such as FCOS [35]. We show that our architectural and sampling optimizations on FCOS can increase OWP performance by as much as 6% in recall on novel classes, marking the first proposal-free one-stage detection network to achieve comparable performance to RPN based two-stage networks. Furthermore, we show that the inherent, decoupled architecture of FCOS has benefits to retaining classification performance. While two-stage methods worsen by 6% in recall on novel classes, we show that FCOS only drops 2% when jointly optimizing for OWP and classification.

show abstract

PAFNet: An Efficient Anchor-Free Object Detector Guidance

Cited by 5 publications

References 35 publications

Training a Disaster Victim Detection Network for UAV Search and Rescue Using Harmonious Composite Images

Training a Disaster Victim Detection Network for UAV Search and Rescue Using Harmonious Composite Images

A Novel Lightweight Object Detection Network with Attention Modules and Hierarchical Feature Pyramid

Extending One-Stage Detection with Open-World Proposals

Contact Info

Product

Resources

About