Object detection and recognition is a very important topic with significant research value. This research develops an optimised model of moving target identification based on CNN to address the issues of insufficient positioning information and low target detection accuracy (convolutional neural network). In this article, the target classification information and semantic location information are obtained through the fusion of the target detection model and the depth semantic segmentation model. The classification and position portion of the target detection model is provided by the simultaneous fusion of the image features carrying various information and a pyramid structure of multiscale image features so that the matched image fusion characteristics can be used by the target detection model to detect targets of various sizes and shapes. According to experimental findings, this method’s accuracy rate is 0.941, which is 0.189 higher than that of the LSTM-NMS algorithm. Through the migration of CNN and the learning of context information, this technique has great robustness and enhances the scene adaptability of feature extraction as well as the accuracy of moving target position detection.
Object detection has been an important research branch in the field of computer vision. The single-shot-detection (SSD) is an object detection model based on deep learning, which can achieve a good balance between the detection accuracy and the detection speed, but has the problem of poor recognition accuracy for small objects. To address this limitation, this paper improves the structure of the SSD feature pyramid and up-samples the shallow feature map with small object information and fuses it with the upper feature map, thus enhancing the ability of the shallow feature map to represent detailed information. In this way, not only the overall detection accuracy of the SSD is improved, but also a relatively high detection speed is maintained. The proposed model is verified by experiments on two common datasets, the Pascal VOC and MS COCO datasets. On the Pascal VOC07+12, MS COCO14, and VOC07+12+COCO datasets, the improved model achieves the mean average precision values of 80.1% (+3.3% compared with the conventional model), 49.9% (+6.8%), and 82.1% (+3.0%), respectively. Meanwhile, the proposed model can achieve the detection speed of 42.2 frames per second.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.