2021
DOI: 10.48550/arxiv.2111.00902
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices

Abstract: The better accuracy and efficiency trade-off has been a challenging problem in object detection. In this work, we are dedicated to studying key optimizations and neural network architecture choices for object detection to improve accuracy and efficiency. We investigate the applicability of the anchor-free strategy on lightweight object detection models. We enhance the backbone structure and design the lightweight structure of the neck, which improves the feature extraction ability of the network. We improve la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
30
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 37 publications
(42 citation statements)
references
References 27 publications
(47 reference statements)
0
30
0
Order By: Relevance
“…Compared with state-of-the-art CNN-based object detectors (e.g., YoloX [58], EfficientDet [38], PP-PicoDet-L [61]), EfficientViT also provides significant improvements. Specifically, EfficientViT-Det-r608 provides 1.7 AP improvement over PP-PicoDet-L and requires slightly fewer MACs.…”
Section: Coco Object Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…Compared with state-of-the-art CNN-based object detectors (e.g., YoloX [58], EfficientDet [38], PP-PicoDet-L [61]), EfficientViT also provides significant improvements. Specifically, EfficientViT-Det-r608 provides 1.7 AP improvement over PP-PicoDet-L and requires slightly fewer MACs.…”
Section: Coco Object Detectionmentioning
confidence: 99%
“…† denotes the best result we find for CNN-based mobile object detection, which is achieved with a bunch of additional techniques (e.g., neural architecture search, ghost module, CSP, Cycle-EMA, etc.). Compared with this strong baseline (PP-PicoDet-L[61]), EfficientViT provides 1.7 higher AP with slightly lower MACs.…”
mentioning
confidence: 97%
“…The main idea of knowledge distillation is to distill knowledge from a large model to a small model. Nowadays, lightweight networks have become a popular research direction in object detection, such as PP-PicoDet [31], Nanodet [32], and YOLO-Fastest [33]. They have significantly reduced the number of model parameters and improved the detection speed, but the accuracy is comparatively low.…”
Section: Related Workmentioning
confidence: 99%
“…Alternatively, ViLBERT [26] and LXMERT [27] introduced the two-stream architecture, where two transformers are applied to images and text independently, which is fused by a third transformer in a later stage. These models typically rely on region-based image features extracted a pre-trained object detectors based on commonly used two-staged detectors (typically Faster R-CNN model [28] or its extension Mask-RCNN [29]), or single-stage detectors (typically SSD and YOLO V3 [30]) or anchor-free detectors(e.g., [31]). Another directions are patch embedding [32,33,34,35,36].…”
Section: Related Workmentioning
confidence: 99%