STDnet-ST: Spatio-temporal ConvNet for small object detection

Bosquet, Brais; Mucientes, Manuel; Brea, V.M.

doi:10.1016/j.patcog.2021.107929

Cited by 49 publications

(36 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these methods are difficult to cover the context information for video. Although there are some methods to integrate Spatio-temporal information, e.g., Spatio-temporal neural network built on STDnet (STDnet-ST) [121], the problems of missed and false inspection still persist.…”

Section: Object Detection From Uav-borne Videomentioning

confidence: 99%

“…Appl. 2019 -Zhang et al [118] Appearance deterioration, occlusion, motion blur VisDrone-VID MIPR 2020 -MOR-UAVNet [119] Moving object MOR-UAV MM 2020 https://visionintelligence.github.io/Datasets.html TDFA [120] Small-scale Okutama, VisDrone-VID Multidim Syst Sign P 2021 -STDnet-ST [121] Small object USC-GRAD-STDdb,UAVDT,VisDrone-VID PR 2021 - [118] and [120] used the effective CNN model for optical flow (PWC-Net) [132] method and spatial pyramid network (SPyNet) [133] to obtain the motion information of two neighbor frames, respectively. Zhu et al [134] designed fusion feature maps to achieve VID using deep feature flow (DFF) by learning the feature maps of key frames using feature extracting and of non-key frames using FlowNet.…”

Section: A Optical Flow-based Networkmentioning

confidence: 99%

“…Depending on the computing power of NVIDIA's GPU 15 , the computation cost can be estimated with backbone network in the corresponding method. [118] Cas R-CNN+IRR-PWC [132] 720 × 1280 --2020 MOR-UAVNet [119] MOR-UAVNetv14 608 × 608 Workstation(NVIDIA RTX 2080 Ti/11GB) 10.5 2020 TDFA [120] FlowNet+ Fea Aggregation 720 × 1280 Workstation(NVIDIA GeForce GTX TITAN X/12GB) 3.8 2021 STDnet-ST [121] STDnet+ConvNet 1280 × 720 --2021 STDnet-ST [121] STDnet+ConvNet 1920 × 1080 --2021 STDnet-ST [121] STDnet+ConvNet 1024 × 540 --2021 analyzed according to three UAV topics, i.e., SOD, VID, and MOT. The conclusions were drawn as follows.…”

Section: Estimation Of Computation Costmentioning

confidence: 99%

See 2 more Smart Citations

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Wu,

Li,

Hong

et al. 2021

Preprint

View full text Add to dashboard Cite

This is the pre-acceptance version, to read the final version please go to IEEE Geoscience and Remote Sensing Magazine on IEEE Xplore. Owing to effective and flexible data acquisition, unmanned aerial vehicle (UAV) has recently become a hotspot across the fields of computer vision (CV) and remote sensing (RS). Inspired by recent success of deep learning (DL), many advanced object detection and tracking approaches have been widely applied to various UAV-related tasks, such as environmental monitoring, precision agriculture, traffic management. This paper provides a comprehensive survey on the research progress and prospects of DL-based UAV object detection and tracking methods. More specifically, we first outline the challenges, statistics of existing methods, and provide solutions from the perspectives of DL-based models in three research topics: object detection from the image, object detection from the video, and object tracking from the video. Open datasets related to UAV-dominated object detection and tracking are exhausted, and four benchmark datasets are employed for performance evaluation using some state-of-the-art methods. Finally, prospects and considerations for the future work are discussed and summarized. It is expected that this survey can facilitate those researchers who come from remote sensing field with an overview of DL-based UAV object detection and tracking methods, along with some thoughts on their further developments.

show abstract

Section: Object Detection From Uav-borne Videomentioning

confidence: 99%

Section: A Optical Flow-based Networkmentioning

confidence: 99%

Section: Estimation Of Computation Costmentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Wu,

Li,

Hong

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…However, it is noteworthy that the existing networks, such as AlexNet, RCNN, and Fast-RCNN, suffer from non-negligible miss-detections and low recall for the small spots. For instance, if the spot pixels are <32 32 (Bosquet et al, 2018 ), or when the image resolution is not high. According to the definition of the international organization SPIE, a small target is a target area <80 pixels in a 256 × 256 image, that is, the target whose pixel proportion is <0.12% of the total image pixels.…”

Section: Introductionmentioning

confidence: 99%

Grape Leaf Black Rot Detection Based on Super-Resolution Image Enhancement and Deep Learning

et al. 2021

View full text Add to dashboard Cite

The disease spots on the grape leaves can be detected by using the image processing and deep learning methods. However, the accuracy and efficiency of the detection are still the challenges. The convolutional substrate information is fuzzy, and the detection results are not satisfactory if the disease spot is relatively small. In particular, the detection will be difficult if the number of pixels of the spot is <32 × 32 in the image. In order to effectively address this problem, we present a super-resolution image enhancement and convolutional neural network-based algorithm for the detection of black rot on grape leaves. First, the original image is up-sampled and enhanced with local details using the bilinear interpolation. As a result, the number of pixels in the image increase. Then, the enhanced images are fed into the proposed YOLOv3-SPP network for detection. In the proposed network, the IOU (Intersection Over Union, IOU) in the original YOLOv3 network is replaced with GIOU (Generalized Intersection Over Union, GIOU). In addition, we also add the SPP (Spatial Pyramid Pooling, SPP) module to improve the detection performance of the network. Finally, the official pre-trained weights of YOLOv3 are used for fast convergence. The test set test_pv from the Plant Village and the test set test_orchard from the orchard field were used to evaluate the network performance. The results of test_pv show that the grape leaf black rot is detected by the YOLOv3-SPP with 95.79% detection accuracy and 94.52% detector recall, which is a 5.94% greater in terms of accuracy and 10.67% greater in terms of recall as compared to the original YOLOv3. The results of test_orchard show that the method proposed in this paper can be applied in field environment with 86.69% detection precision and 82.27% detector recall, and the accuracy and recall were improved to 94.05 and 93.26% if the images with the simple background. Therefore, the detection method proposed in this work effectively solves the detection task of small targets and improves the detection effectiveness of the grape leaf black rot.

show abstract

“…Wang Hongfeng et al [19] proposed a generative adversarial network (GAN) capable of image super-resolution and two-stage small object detection, which exhibited a better detection performance than mainstream methods. Bosquet Brais et al [20] introduced STDnet-ST, an end-to-end spatiotemporal convolutional neural network for small object detection in video, which achieved state-of-the-art results for small objects. Lian Jing et al [21] proposed a small object detection method in traffic scenes based on attention feature fusion, which improved the detection accuracy of small objects in traffic scenes.…”

Section: Introductionmentioning

confidence: 99%

Small Object Detection in Traffic Scenes Based on YOLO-MXANet

Cheng

Zheng

et al. 2021

Sensors

View full text Add to dashboard Cite

In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is utilized to improve loss function for promoting the positioning accuracy of the small object. In order to reduce the complexity of the model, we present a lightweight yet powerful backbone network (named SA-MobileNeXt) that incorporates channel and spatial attention. Our approach can extract expressive features more effectively by applying the Shuffle Channel and Spatial Attention (SCSA) module into the SandGlass Block (SGBlock) module while increasing the parameters by a small number. In addition, the data enhancement method combining Mosaic and Mixup is employed to improve the robustness of the training model. The Multi-scale Feature Enhancement Fusion (MFEF) network is proposed to fuse the extracted features better. In addition, the SiLU activation function is utilized to optimize the Convolution-Batchnorm-Leaky ReLU (CBL) module and the SGBlock module to accelerate the convergence of the model. The ablation experiments on the KITTI dataset show that each improved method is effective. The improved algorithm reduces the complexity and detection speed of the model while improving the object detection accuracy. The comparative experiments on the KITTY dataset and CCTSDB dataset with other algorithms show that our algorithm also has certain advantages.

show abstract

STDnet-ST: Spatio-temporal ConvNet for small object detection

Cited by 49 publications

References 9 publications

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Grape Leaf Black Rot Detection Based on Super-Resolution Image Enhancement and Deep Learning

Small Object Detection in Traffic Scenes Based on YOLO-MXANet

Contact Info

Product

Resources

About