This paper introduce an efficient object detection network named Trident‐You Only Look Once (YOLO), which is designed for mobile devices with limited computing power. The new architecture is improved based on YOLO v4‐tiny. The authors redesign the network structure and propose a trident feature pyramid network (Trident‐FPN), which can improve the precision and recall of lightweight object detection. Specifically, Trident‐FPN increases the computational complexity by only a small amount of floating point operations per second (FLOPs) and obtains a multi‐scale feature map of the model, which significantly lightweight object detection performance. To enlarge the receptive field of the network with the fewest FLOPs, this paper redesign the receptive field block (RFB) and spatial pyramid pooling (SPP) layer and propose tinier cross‐stage partial RFBs and smaller cross‐stage partial SPPs. This paper present extensive experiments, and Trident‐YOLO shows strong performance compared to that of other popular models on the PASCAL VOC and MS COCO. On the MS COCO and PASCAL VOC 2007 test sets, the mean average precision (mAP) of Trident‐YOLO improved by 4.5% and 5.0%, respectively. Trident‐YOLO also reduce the network size by more than 54.4% compared to YOLO v4‐tiny. With a 23.7% FLOP reduction, the FPS is improved by 1.9 on an Nvidia Jetson Xavier NX.
Object detection is one of the main tasks of computer vision. Object detection algorithms usually rely on deep convolutional neural networks, which require the host device to have high computing capabilities, greatly limiting the application of object detection methods for mobile devices with limited computing capabilities, such as embedded devices. Among the current object detection algorithms, the you only look once (YOLO) series takes both speed and accuracy into consideration and is one of the most commonly used methods for object detection. In this article, TRC-YOLO is proposed, which improves the mean average precision (mAP) and real-time detection speed of the model while reducing the size of the model. In TRC-YOLO, the convolution kernel of YOLO v4-tiny is pruned and an expansive convolution layer is introduced into the residual module of the network to produce an hourglass Cross Stage Partial ResNet (CSPResNet) structure. A receptive field block (RFB) that simulates human vision is also added, increasing the receptive field of the model and strengthening the feature extraction ability of the network. In addition, the convolutional block attention module is applied, which combines spatial attention and channel attention, to enhance the effective features of the model and reduce the negative impact of noise on the model. The size of the TRC-YOLO model is 17.8 MB, which is 5.9 MB smaller than YOLO v4-tiny, and the model parameter is 2.983 billion floating point operations per second (BFLOP/s) (3.834 BFLOP/s less than YOLO v4-tiny). In addition, TRC-YOLO achieves a real-time performance of 36.9 frames per second on a Jetson Xavier NX, and its mAP on the PASCAL VOC dataset is 66.4% (3.83% higher than YOLO v4-tiny). In addition, the mAP of TRC-YOLO on the MS COCO dataset is 37.7%, which is 1.9% higher than that of the baseline model. K E Y W O R D SCBAM, dilated convolution, object detection, receptive field block (RFB), TridentNet, YOLO | INTRODUCTIONObject detection is a challenging task in the field of computer vision. Traditional object detection algorithms, such as histograms of oriented gradient (HOG) [1] and the deformable part-based model (DPM) [2], are mainly based on region selection using sliding windows but have high time complexity and cannot meet real-time requirements. In recent years, with the development of deep neural networks and the improvement of hardware computing power [3,4], a series of major breakthroughs with excellent performance have been made in the field of object detection.Compared with two-stage object detection algorithms, one-stage object detection algorithms, such as the single shot multibox detector (SSD) [5] and you only look once (YOLO) series [6][7][8][9][10], achieve a balance between speed and accuracy and have been widely used in practice. The YOLO series includes YOLO v1, YOLO v2, YOLO v3, and YOLO v4, the This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.